8000 discrepancy between two ways of measuring the map@IoU=0.5 · Issue #5643 · AlexeyAB/darknet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

discrepancy between two ways of measuring the map@IoU=0.5 #5643

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZhengRui opened this issue May 16, 2020 · 5 comments
Open

discrepancy between two ways of measuring the map@IoU=0.5 #5643

ZhengRui opened this issue May 16, 2020 · 5 comments
Labels

Comments

@ZhengRui
Copy link
ZhengRui commented May 16, 2020

@AlexeyAB Thanks for the great work ! I followed #2145 (comment)
to get the map@IoU=0.5 of yolov4.weights model on COCO2017 validation set.

  • Method 1: ./darknet detector map ~/Work/Datasets/yolo_data/coco2017/coco.data cfg/yolov4.cfg weights/yolov4.weights -iou_thresh 0.50 -points 101 gives me map@IoU=0.5 73.54, the end of the log is like this:
 calculation mAP (mean average precision)...
5000
 detections_count = 262524, unique_truth_count = 36781
class_id = 0, name = person, ap = 82.64%         (TP = 8572, FP = 2720)
class_id = 1, name = bicycle, ap = 68.83%        (TP = 201, FP = 71)
class_id = 2, name = car, ap = 75.89%            (TP = 1404, FP = 573)
class_id = 3, name = motorbike, ap = 79.72%      (TP = 268, FP = 64)
class_id = 4, name = aeroplane, ap = 92.33%      (TP = 127, FP = 17)
class_id = 5, name = bus, ap = 91.31%            (TP = 246, FP = 30)
class_id = 6, name = train, ap = 93.85%          (TP = 167, FP = 14)
class_id = 7, name = truck, ap = 69.39%          (TP = 271, FP = 187)
class_id = 8, name = boat, ap = 65.13%           (TP = 256, FP = 93)
class_id = 9, name = traffic light, ap = 63.80%          (TP = 405, FP = 204)
class_id = 10, name = fire hydrant, ap = 92.53%          (TP = 86, FP = 7)
class_id = 11, name = stop sign, ap = 83.79%     (TP = 60, FP = 11)
class_id = 12, name = parking meter, ap = 87.12%         (TP = 50, FP = 9)
class_id = 13, name = bench, ap = 54.01%         (TP = 197, FP = 92)
class_id = 14, name = bird, ap = 61.53%          (TP = 263, FP = 190)
class_id = 15, name = cat, ap = 94.58%           (TP = 185, FP = 16)
class_id = 16, name = dog, ap = 90.06%           (TP = 187, FP = 32)
class_id = 17, name = horse, ap = 91.70%         (TP = 236, FP = 27)
class_id = 18, name = sheep, ap = 85.78%         (TP = 293, FP = 74)
class_id = 19, name = cow, ap = 84.70%           (TP = 299, FP = 73)
class_id = 20, name = elephant, ap = 90.72%      (TP = 232, FP = 49)
class_id = 21, name = bear, ap = 92.20%          (TP = 62, FP = 6)
class_id = 22, name = zebra, ap = 92.21%         (TP = 232, FP = 20)
class_id = 23, name = giraffe, ap = 90.63%       (TP = 197, FP = 19)
class_id = 24, name = backpack, ap = 48.48%      (TP = 183, FP = 137)
class_id = 25, name = umbrella, ap = 75.14%      (TP = 305, FP = 121)
class_id = 26, name = handbag, ap = 46.58%       (TP = 236, FP = 201)
class_id = 27, name = tie, ap = 64.05%           (TP = 164, FP = 109)
class_id = 28, name = suitcase, ap = 80.04%      (TP = 230, FP = 106)
class_id = 29, name = frisbee, ap = 93.73%       (TP = 104, FP = 9)
class_id = 30, name = skis, ap = 59.55%          (TP = 139, FP = 74)
class_id = 31, name = snowboard, ap = 68.28%     (TP = 46, FP = 10)
class_id = 32, name = sports ball, ap = 72.56%           (TP = 182, FP = 69)
class_id = 33, name = kite, ap = 73.64%          (TP = 237, FP = 103)
class_id = 34, name = baseball bat, ap = 75.68%          (TP = 107, FP = 35)
class_id = 35, name = baseball glove, ap = 79.19%        (TP = 108, FP = 29)
class_id = 36, name = skateboard, ap = 86.07%            (TP = 152, FP = 25)
class_id = 37, name = surfboard, ap = 76.83%     (TP = 196, FP = 46)
class_id = 38, name = tennis racket, ap = 89.99%         (TP = 195, FP = 28)
class_id = 39, name = bottle, ap = 69.77%        (TP = 720, FP = 418)
class_id = 40, name = wine glass, ap = 73.13%            (TP = 237, FP = 102)
class_id = 41, name = cup, ap = 75.87%           (TP = 657, FP = 306)
class_id = 42, name = fork, ap = 64.91%          (TP = 120, FP = 70)
class_id = 43, name = knife, ap = 53.22%         (TP = 156, FP = 85)
class_id = 44, name = spoon, ap = 53.06%         (TP = 129, FP = 97)
class_id = 45, name = bowl, ap = 73.49%          (TP = 450, FP = 240)
class_id = 46, name = banana, ap = 54.06%        (TP = 192, FP = 149)
class_id = 47, name = apple, ap = 46.13%         (TP = 117, FP = 132)
class_id = 48, name = sandwich, ap = 73.83%      (TP = 121, FP = 59)
class_id = 49, name = orange, ap = 57.15%        (TP = 161, FP = 125)
class_id = 50, name = broccoli, ap = 49.19%      (TP = 145, FP = 99)
class_id = 51, name = carrot, ap = 44.55%        (TP = 176, FP = 195)
class_id = 52, name = hot dog, ap = 74.72%       (TP = 90, FP = 35)
class_id = 53, name = pizza, ap = 82.72%         (TP = 229, FP = 54)
class_id = 54, name = donut, ap = 78.19%         (TP = 254, FP = 123)
class_id = 55, name = cake, ap = 76.96%          (TP = 224, FP = 95)
class_id = 56, name = chair, ap = 63.33%         (TP = 1089, FP = 736)
class_id = 57, name = sofa, ap = 71.87%          (TP = 171, FP = 71)
class_id = 58, name = pottedplant, ap = 65.75%           (TP = 224, FP = 121)
class_id = 59, name = bed, ap = 75.98%           (TP = 102, FP = 30)
class_id = 60, name = diningtable, ap = 51.97%           (TP = 370, FP = 290)
class_id = 61, name = toilet, ap = 90.17%        (TP = 155, FP = 26)
class_id = 62, name = tvmonitor, ap = 87.38%     (TP = 240, FP = 59)
class_id = 63, name = laptop, ap = 89.24%        (TP = 202, FP = 38)
class_id = 64, name = mouse, ap = 88.60%         (TP = 90, FP = 15)
class_id = 65, name = remote, ap = 72.18%        (TP = 192, FP = 79)
class_id = 66, name = keyboard, ap = 80.55%      (TP = 109, FP = 30)
class_id = 67, name = cell phone, ap = 72.60%            (TP = 179, FP = 61)
class_id = 68, name = microwave, ap = 90.67%     (TP = 47, FP = 12)
class_id = 69, name = oven, ap = 67.78%          (TP = 87, FP = 46)
class_id = 70, name = toaster, ap = 63.45%       (TP = 5, FP = 4)
class_id = 71, name = sink, ap = 76.30%          (TP = 160, FP = 49)
class_id = 72, name = refrigerator, ap = 86.59%          (TP = 103, FP = 23)
class_id = 73, name = book, ap = 35.09%          (TP = 513, FP = 886)
class_id = 74, name = clock, ap = 82.22%         (TP = 206, FP = 56)
class_id = 75, name = vase, ap = 72.40%          (TP = 194, FP = 109)
class_id = 76, name = scissors, ap = 61.86%      (TP = 20, FP = 7)
class_id = 77, name = teddy bear, ap = 79.10%            (TP = 149, FP = 50)
class_id = 78, name = hair drier, ap = 32.99%            (TP = 2, FP = 2)
class_id = 79, name = toothbrush, ap = 59.69%            (TP = 38, FP = 31)

 for conf_thresh = 0.25, precision = 0.70, recall = 0.70, F1-score = 0.70
 for conf_thresh = 0.25, TP = 25905, FP = 10915, FN = 10876, average IoU = 58.64 %

 IoU threshold = 50 %, used 101 Recall-points
 mean average precision (mAP@0.50) = 0.735378, or 73.54 %
Total Detection Time: 129 Seconds

Set -points flag:
 `-points 101` for MS COCO
 `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
 `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
  • Method 2:
    1. ./darknet detector valid ~/Work/Datasets/yolo_data/coco2017/coco.data cfg/yolov4.cfg weights/yolov4.weights to generate coco_results.json inside results folder
    2. I use this evaluation script coco_eval.py to run the evaluation:
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import argparse


def coco_eval(args):
    cocoGt = COCO(args.gt_json)
    cocoDt = cocoGt.loadRes(args.pred_json)
    cocoEval = COCOeval(cocoGt, cocoDt, args.eval_type)
    cocoEval.evaluate()
    cocoEval.accumulate()
    cocoEval.summarize()


if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Evaluate segm/bbox/keypoints in COCO format.')
    parser.add_argument('gt_json', type=str, help="COCO format segmentation/detection/keypoints ground truth json file")
    parser.add_argument('pred_json', type=str, help="COCO format segmentation/detection/keypoints prediction json file")
    parser.add_argument('eval_type', type=str, choices=['segm', 'bbox', 'keypoints'], help="Evaluation type")
    args = parser.parse_args()
    coco_eval(args)

python coco2017_data/coco_eval.py ../../Datasets/coco/annotations/instances_val2017.json ./results/coco_results.json bbox gives map@IoU=0.5 74.9, and the log is:

loading annotations into memory...
Done (t=0.37s)
creating index...
index created!
Loading and preparing results...
DONE (t=3.28s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=56.92s).
Accumu
8000
lating evaluation results...
DONE (t=7.39s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.505
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.749
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.557
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.357
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.559
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.614
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.368
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.598
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.633
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.500
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.680
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.757

Do you know why these two methods give different map@IoU=0.5, maybe I misunderstood something?

@AlexeyAB
Copy link
Owner
AlexeyAB commented May 16, 2020

I don't know the reason.
If you will find mistake in my code or pycocotool, let me know )

For checking accuracy for MSCOCO models, I use pycocotool or codalab-evaluation server.

@ZhengRui
Copy link
Author

I thought of some potential reasons:

On the pycocotools side:

On the darknet side, there are some discrepancies between using ./darknet detector valid and ./darknet detector map

I have tried to not ignore iscrowd boxes, and not filtering using maxDets on pycocotools, while using same thresh=0.001 and same get_network_boxes parameters for valid and map on darknet (used -letter_box option in the command), but still not able to get same map@IoU=0.5. I haven't checked the downstream logic that calculates pr curve and map. I focused on ensuring detections_count equal to the number of boxes sent to pycocotools, but still not able to make it.

Do you have any further thoughts @AlexeyAB ? Thanks.

@AlexeyAB
Copy link
Owner
AlexeyAB commented May 17, 2020

@ZhengRui I don't know.
Try to un-comment this line and recompile:

//printf("Precision = %1.2f, Recall = %1.2f, avg IOU = %2.2f%% \n\n", class_precision, class_recall, avg_iou_per_class[i]);

Try to use the same thresh=0.001 in both cases.

Also try to set 11 pr-points instead of 101 points in both Darknet and Pycocotool, for easier debugging.

Then compare Precision and Recall for one of classes between Darknet and Pycocotool (but not for person-class to avoid crowd issue).

@tand826
Copy link
tand826 commented Jun 4, 2020

@ZhengRui
Is there any progress?

@ZhengRui
Copy link
Author
ZhengRui commented Jun 5, 2020

@tand826 Unfortunately I didn't got time to further look into this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0