I have a dataset I use to test my object detection model, let’s say test_dataset
.
When evaluating with COCO eval (through YOLOX eval.py script) for a given model, I get this result:
Average forward time: 23.05 ms, Average NMS time: 2.60 ms, Average inference time: 25.65 ms
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.724
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.957
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.831
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.278
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.810
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.535
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.755
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.759
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.349
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.649
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.839
per class AP:
| class | AP | class | AP | class | AP |
|:-------------|:-------|:--------|:-------|:-------------|:-------|
| cargo | 59.491 | ferry | 87.701 | fishing boat | 67.328 |
| sailing boat | 75.134 | | | | |
per class AR:
| class | AR | class | AR | class | AR |
|:-------------|:-------|:--------|:-------|:-------------|:-------|
| cargo | 64.802 | ferry | 89.717 | fishing boat | 71.506 |
| sailing boat | 77.748 |
However, when evaluating with FiftyOne , I get the following:
precision recall f1-score support
cargo 0.76 0.91 0.83 606
ferry 0.97 1.00 0.99 990
fishing boat 0.85 0.96 0.91 332
sailing boat 0.87 0.97 0.92 706
micro avg 0.88 0.97 0.92 2634
macro avg 0.87 0.96 0.91 2634
weighted avg 0.88 0.97 0.92 2634
I was using this script:
results = dataset.evaluate_detections(
"predictions",
gt_field="detections",
compute_mAP=True,
method="coco"
)
results.print_report()
I was expecting the same result since FiftyOne function use the COCO evaluation format.
I can’t understand what are those metrics.