I’m research about the evaluation metric for IR system.
Beside recall and precision, I want to evaluate the completeness and comprehension or percentage of redundant part in the retrieved passage compare to the ground truth answer?
For example, the expected answer has idea A and B. But the retrieved passage has idea A, C, and D.
maybe it’s still relevant but not comprehensive.
Is there any metric to do that?