discrepancy between two ways of measuring the map@IoU=0.5 #5643

ZhengRui · 2020-05-16T16:48:32Z

@AlexeyAB Thanks for the great work ! I followed #2145 (comment)
to get the map@IoU=0.5 of yolov4.weights model on COCO2017 validation set.

Method 1: ./darknet detector map ~/Work/Datasets/yolo_data/coco2017/coco.data cfg/yolov4.cfg weights/yolov4.weights -iou_thresh 0.50 -points 101 gives me map@IoU=0.5 73.54, the end of the log is like this:

 calculation mAP (mean average precision)...
5000
 detections_count = 262524, unique_truth_count = 36781
class_id = 0, name = person, ap = 82.64%         (TP = 8572, FP = 2720)
class_id = 1, name = bicycle, ap = 68.83%        (TP = 201, FP = 71)
class_id = 2, name = car, ap = 75.89%            (TP = 1404, FP = 573)
class_id = 3, name = motorbike, ap = 79.72%      (TP = 268, FP = 64)
class_id = 4, name = aeroplane, ap = 92.33%      (TP = 127, FP = 17)
class_id = 5, name = bus, ap = 91.31%            (TP = 246, FP = 30)
class_id = 6, name = train, ap = 93.85%          (TP = 167, FP = 14)
class_id = 7, name = truck, ap = 69.39%          (TP = 271, FP = 187)
class_id = 8, name = boat, ap = 65.13%           (TP = 256, FP = 93)
class_id = 9, name = traffic light, ap = 63.80%          (TP = 405, FP = 204)
class_id = 10, name = fire hydrant, ap = 92.53%          (TP = 86, FP = 7)
class_id = 11, name = stop sign, ap = 83.79%     (TP = 60, FP = 11)
class_id = 12, name = parking meter, ap = 87.12%         (TP = 50, FP = 9)
class_id = 13, name = bench, ap = 54.01%         (TP = 197, FP = 92)
class_id = 14, name = bird, ap = 61.53%          (TP = 263, FP = 190)
class_id = 15, name = cat, ap = 94.58%           (TP = 185, FP = 16)
class_id = 16, name = dog, ap = 90.06%           (TP = 187, FP = 32)
class_id = 17, name = horse, ap = 91.70%         (TP = 236, FP = 27)
class_id = 18, name = sheep, ap = 85.78%         (TP = 293, FP = 74)
class_id = 19, name = cow, ap = 84.70%           (TP = 299, FP = 73)
class_id = 20, name = elephant, ap = 90.72%      (TP = 232, FP = 49)
class_id = 21, name = bear, ap = 92.20%          (TP = 62, FP = 6)
class_id = 22, name = zebra, ap = 92.21%         (TP = 232, FP = 20)
class_id = 23, name = giraffe, ap = 90.63%       (TP = 197, FP = 19)
class_id = 24, name = backpack, ap = 48.48%      (TP = 183, FP = 137)
class_id = 25, name = umbrella, ap = 75.14%      (TP = 305, FP = 121)
class_id = 26, name = handbag, ap = 46.58%       (TP = 236, FP = 201)
class_id = 27, name = tie, ap = 64.05%           (TP = 164, FP = 109)
class_id = 28, name = suitcase, ap = 80.04%      (TP = 230, FP = 106)
class_id = 29, name = frisbee, ap = 93.73%       (TP = 104, FP = 9)
class_id = 30, name = skis, ap = 59.55%          (TP = 139, FP = 74)
class_id = 31, name = snowboard, ap = 68.28%     (TP = 46, FP = 10)
class_id = 32, name = sports ball, ap = 72.56%           (TP = 182, FP = 69)
class_id = 33, name = kite, ap = 73.64%          (TP = 237, FP = 103)
class_id = 34, name = baseball bat, ap = 75.68%          (TP = 107, FP = 35)
class_id = 35, name = baseball glove, ap = 79.19%        (TP = 108, FP = 29)
class_id = 36, name = skateboard, ap = 86.07%            (TP = 152, FP = 25)
class_id = 37, name = surfboard, ap = 76.83%     (TP = 196, FP = 46)
class_id = 38, name = tennis racket, ap = 89.99%         (TP = 195, FP = 28)
class_id = 39, name = bottle, ap = 69.77%        (TP = 720, FP = 418)
class_id = 40, name = wine glass, ap = 73.13%            (TP = 237, FP = 102)
class_id = 41, name = cup, ap = 75.87%           (TP = 657, FP = 306)
class_id = 42, name = fork, ap = 64.91%          (TP = 120, FP = 70)
class_id = 43, name = knife, ap = 53.22%         (TP = 156, FP = 85)
class_id = 44, name = spoon, ap = 53.06%         (TP = 129, FP = 97)
class_id = 45, name = bowl, ap = 73.49%          (TP = 450, FP = 240)
class_id = 46, name = banana, ap = 54.06%        (TP = 192, FP = 149)
class_id = 47, name = apple, ap = 46.13%         (TP = 117, FP = 132)
class_id = 48, name = sandwich, ap = 73.83%      (TP = 121, FP = 59)
class_id = 49, name = orange, ap = 57.15%        (TP = 161, FP = 125)
class_id = 50, name = broccoli, ap = 49.19%      (TP = 145, FP = 99)
class_id = 51, name = carrot, ap = 44.55%        (TP = 176, FP = 195)
class_id = 52, name = hot dog, ap = 74.72%       (TP = 90, FP = 35)
class_id = 53, name = pizza, ap = 82.72%         (TP = 229, FP = 54)
class_id = 54, name = donut, ap = 78.19%         (TP = 254, FP = 123)
class_id = 55, name = cake, ap = 76.96%          (TP = 224, FP = 95)
class_id = 56, name = chair, ap = 63.33%         (TP = 1089, FP = 736)
class_id = 57, name = sofa, ap = 71.87%          (TP = 171, FP = 71)
class_id = 58, name = pottedplant, ap = 65.75%           (TP = 224, FP = 121)
class_id = 59, name = bed, ap = 75.98%           (TP = 102, FP = 30)
class_id = 60, name = diningtable, ap = 51.97%           (TP = 370, FP = 290)
class_id = 61, name = toilet, ap = 90.17%        (TP = 155, FP = 26)
class_id = 62, name = tvmonitor, ap = 87.38%     (TP = 240, FP = 59)
class_id = 63, name = laptop, ap = 89.24%        (TP = 202, FP = 38)
class_id = 64, name = mouse, ap = 88.60%         (TP = 90, FP = 15)
class_id = 65, name = remote, ap = 72.18%        (TP = 192, FP = 79)
class_id = 66, name = keyboard, ap = 80.55%      (TP = 109, FP = 30)
class_id = 67, name = cell phone, ap = 72.60%            (TP = 179, FP = 61)
class_id = 68, name = microwave, ap = 90.67%     (TP = 47, FP = 12)
class_id = 69, name = oven, ap = 67.78%          (TP = 87, FP = 46)
class_id = 70, name = toaster, ap = 63.45%       (TP = 5, FP = 4)
class_id = 71, name = sink, ap = 76.30%          (TP = 160, FP = 49)
class_id = 72, name = refrigerator, ap = 86.59%          (TP = 103, FP = 23)
class_id = 73, name = book, ap = 35.09%          (TP = 513, FP = 886)
class_id = 74, name = clock, ap = 82.22%         (TP = 206, FP = 56)
class_id = 75, name = vase, ap = 72.40%          (TP = 194, FP = 109)
class_id = 76, name = scissors, ap = 61.86%      (TP = 20, FP = 7)
class_id = 77, name = teddy bear, ap = 79.10%            (TP = 149, FP = 50)
class_id = 78, name = hair drier, ap = 32.99%            (TP = 2, FP = 2)
class_id = 79, name = toothbrush, ap = 59.69%            (TP = 38, FP = 31)

 for conf_thresh = 0.25, precision = 0.70, recall = 0.70, F1-score = 0.70
 for conf_thresh = 0.25, TP = 25905, FP = 10915, FN = 10876, average IoU = 58.64 %

 IoU threshold = 50 %, used 101 Recall-points
 mean average precision (mAP@0.50) = 0.735378, or 73.54 %
Total Detection Time: 129 Seconds

Set -points flag:
 `-points 101` for MS COCO
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

Method 2:
1. ./darknet detector valid ~/Work/Datasets/yolo_data/coco2017/coco.data cfg/yolov4.cfg weights/yolov4.weights to generate coco_results.json inside results folder
2. I use this evaluation script coco_eval.py to run the evaluation:

from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import argparse


def coco_eval(args):
    cocoGt = COCO(args.gt_json)
    cocoDt = cocoGt.loadRes(args.pred_json)
    cocoEval = COCOeval(cocoGt, cocoDt, args.eval_type)
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Evaluate segm/bbox/keypoints in COCO format.')
    parser.add_argument('gt_json', type=str, help="COCO format segmentation/detection/keypoints ground truth json file")
    parser.add_argument('pred_json', type=str, help="COCO format segmentation/detection/keypoints prediction json file")
    parser.add_argument('eval_type', type=str, choices=['segm', 'bbox', 'keypoints'], help="Evaluation type")
    args = parser.parse_args()
    coco_eval(args)

python coco2017_data/coco_eval.py ../../Datasets/coco/annotations/instances_val2017.json ./results/coco_results.json bbox gives map@IoU=0.5 74.9, and the log is:

loading annotations into memory...
Done (t=0.37s)
creating index...
index created!
Loading and preparing results...
DONE (t=3.28s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=56.92s).
Accumu
8000
lating evaluation results...
DONE (t=7.39s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.749
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.557
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.357
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.614
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.368
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.598
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.680
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.757

Do you know why these two methods give different map@IoU=0.5, maybe I misunderstood something?

The text was updated successfully, but these errors were encountered:

AlexeyAB · 2020-05-16T21:57:06Z

I don't know the reason.
If you will find mistake in my code or pycocotool, let me know )

For checking accuracy for MSCOCO models, I use pycocotool or codalab-evaluation server.

ZhengRui · 2020-05-17T11:36:53Z

I thought of some potential reasons:

On the pycocotools side:

for ground truth: it ignored boxes with iscrowd is True, can comment this line to turn it off. This increase the map@IoU=0.5 from 74.9 to almost 79.
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L109
for detections: it used maxDets to filter the detections for each image_id, can modify this line to be self.maxDets = [1, 10, a_big_number_like_5000] to use all detections. This changes map@IoU=0.5 very little.
https://github.com/cocodataset/cocoapi/blob/8c9bcc3cf640524c4c20a9c40e89cb6a2f2fa0e9/PythonAPI/pycocotools/cocoeval.py#L508

On the darknet side, there are some discrepancies between using ./darknet detector valid and ./darknet detector map

validate_detector() uses thresh=0.001 while validate_detector_map uses thresh=0.005
https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L685
https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L948
not sure even with same thresh, valid and map will generate same boxes as i notice some differences here
https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L732
https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L1013-L1018

I have tried to not ignore iscrowd boxes, and not filtering using maxDets on pycocotools, while using same thresh=0.001 and same get_network_boxes parameters for valid and map on darknet (used -letter_box option in the command), but still not able to get same map@IoU=0.5. I haven't checked the downstream logic that calculates pr curve and map. I focused on ensuring detections_count equal to the number of boxes sent to pycocotools, but still not able to make it.

Do you have any further thoughts @AlexeyAB ? Thanks.

AlexeyAB · 2020-05-17T14:25:25Z

@ZhengRui I don't know.
Try to un-comment this line and recompile:

darknet/src/detector.c

Line 1281 in 0ef5052

    
           //printf("Precision = %1.2f, Recall = %1.2f, avg IOU = %2.2f%% \n\n", class_precision, class_recall, avg_iou_per_class[i]);

Try to use the same thresh=0.001 in both cases.

Also try to set 11 pr-points instead of 101 points in both Darknet and Pycocotool, for easier debugging.

Then compare Precision and Recall for one of classes between Darknet and Pycocotool (but not for person-class to avoid crowd issue).

tand826 · 2020-06-04T02:29:05Z

@ZhengRui
Is there any progress?

ZhengRui · 2020-06-05T14:45:58Z

@tand826 Unfortunately I didn't got time to further look into this

AlexeyAB added the question label May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discrepancy between two ways of measuring the map@IoU=0.5 #5643

discrepancy between two ways of measuring the map@IoU=0.5 #5643

discrepancy between two ways of measuring the map@IoU=0.5 #5643

discrepancy between two ways of measuring the map@IoU=0.5 #5643

Comments