I am doing evaluation on huggingface datasets like hotpot-qa. But the results are like this:
'precision_at_2/mean': 0.0, 'recall_at_2/mean': 0.0, 'ndcg_at_2/mean':0.0 .........
I don’t know why i am getting zeros.Here is my dataframe. The results are predictions from a langchain llm RAG:
results = mlflow.evaluate( data=eval_data, model_type="retriever", evaluators="default", predictions="results", targets="answer", extra_metrics=[mlflow.metrics.ndcg_at_k(2), mlflow.metrics.recall_at_k(2), mlflow.metrics.precision_at_k(2), mlflow.metrics.latency()], evaluator_config={ "col_mapping": { "inputs": "question", "context": "context", "targets": "answer", } },)
Is there any issue with the structure of columns “answer” and “results”?
Here is the notebook
https://colab.research.google.com/drive/1CZ27v9Uf_QvfgtP9UDYWas3yMMpIfZ5y?usp=sharing
How can I get the values for metrics correctly. Please help