Abstract
Due to the large workload of manual annotation of datasets, uneven data quality and high professional thresholds have been a problem. Based on the idea of semi-automatic annotation, this article discusses the method use of interactive methods to obtain accurate annotations of objects. We propose a method of human–machine interactive object annotation based on one-click guidance. Specifically, we click on a point close to the center of the object and use the prior information of this point to give a guide to the model. The advantages of our method are fourfold: (1) the simulated click method is transferable and can be labeled across datasets; (2) clicks help to eliminate irrelevant areas within the bounding box; (3) the operation is more convenient and does not require artificial boxes, we only need to give the relevant location information; (4) our method supports additional click annotations for further correction. To verify the effectiveness of the proposed method, we conducted a lot of experiments on the KITTI and PASCAL VOC2012 datasets, and the results proved that our model has improved average IoU by 18.1% and 14.6% compared with Anno-Mage and CVAT, respectively. Our method focuses on improving the accuracy and efficiency of annotation, and provides a new idea for the field of semi-automatic annotation.
Similar content being viewed by others
References
Real E, Shlens J, Mazzocchi S, Pan X, Vanhoucke V (2017) Youtube-boundingboxes: a large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5296–5305
Nandhini P, Kuppuswami S, Malliga S, DeviPriya R (2022) Enhanced rank attack detection algorithm (e-rad) for securing rpl-based iot networks by early detection and isolation of rank attackers. J Supercomput 1–24
Suseendran G, Akila D, Vijaykumar H, Jabeen TN, Nirmala R, Nayyar A (2022) Multi-sensor information fusion for efficient smart transport vehicle tracking and positioning based on deep learning technique. J Supercomput 1–26
Varga V, Lőrincz A (2020) Reducing human efforts in video segmentation annotation with reinforcement learning. Neurocomputing 405:247–258
Kishorekumar R, Deepa P (2020) A framework for semantic image annotation using legion algorithm. J Supercomput 76(6):4169–4183
Pham T-N, Nguyen V-H, Huh J-H (2023) Integration of improved yolov5 for face mask detector and auto-labeling to generate dataset for fighting against covid-19. J Supercomput 1–27
Boukthir K, Qahtani AM, Almutiry O, Dhahri H, Alimi AM (2022) Reduced annotation based on deep active learning for Arabic text detection in natural scene images. Pattern Recogn Lett 157:42–48
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) Labelme: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173
Su H, Deng J, Fei-Fei L (2012) Crowdsourcing annotations for visual object detection. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence
Acuna D, Ling H, Kar A, Fidler S (2018) Efficient interactive annotation of segmentation datasets with polygon-rnn++. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 859–868
Vondrick C, Patterson D, Ramanan D (2013) Efficiently scaling up crowdsourced video annotation. Int J Comput Vis 101(1):184–204
Mottaghi R, Chen X, Liu X, Cho N-G, Lee S-W, Fidler S, Urtasun R, Yuille A (2014) The role of context for object detection and semantic segmentation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 891–898
Zhang S, Liew JH, Wei Y, Wei S, Zhao Y (2020) Interactive object segmentation with inside–outside guidance. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 12234–12244
Pacha S, Murugan SR, Sethukarasi R (2020) Semantic annotation of summarized sensor data stream for effective query processing. J Supercomput 76(6):4017–4039
Schembera B (2021) Like a rainbow in the dark: metadata annotation for hpc applications in the age of dark data. J Supercomput 77(8):8946–8966
Ling H, Gao J, Kar A, Chen W, Fidler S (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5257–5266
Gao X, Zhang G, Xiong Y (2022) Multi-scale multi-modal fusion for object detection in autonomous driving based on selective kernel. Measurement 194:111001
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the kitti dataset. Int J Robot Res 32(11):1231–1237
Tzutalin: LabelImg. https://github.com/tzutalin/labelImg (2015)
Dutta A, Zisserman A (2019) The via annotation software for images, audio and video. In: Proceedings of the 27th ACM International Conference on Multimedia, pp 2276–2279
Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 2(5), 6
christopher5106: FastAnnotationTool. https://github.com/christopher5106/FastAnnotationTool (2016)
virajmavani: Anno-mage. https://github.com/virajmavani/semi-auto-image-annotation-tool (2018)
OpenVINO: CVAT. https://github.com/openvinotoolkit/cvat (2020)
Wang B, Wu V, Wu B, Keutzer K (2019) Latte: accelerating lidar point cloud annotation via sensor fusion, one-click annotation, and tracking. In: 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, pp 265–272
Piewak F, Pinggera P, Schafer M, Peter D, Schwarz B, Schneider N, Enzweiler M, Pfeiffer D, Zollner M (2018) Boosting lidar-based semantic labeling by cross-modal training data generation. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp 0–0
Yue X, Wu B, Seshia SA, Keutzer K, Sangiovanni-Vincentelli AL (2018) A lidar point cloud generator: from a virtual world to autonomous driving. In: Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, pp 458–464
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) Carla: an open urban driving simulator. In: Conference on Robot Learning. PMLR, pp 1–16
Maninis K-K, Caelles S, Pont-Tuset J, Van Gool L (2018) Deep extreme cut: from extreme points to object segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 616–625
Papadopoulos DP, Uijlings JR, Keller F, Ferrari V (2017) Extreme clicking for efficient object annotation. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4930–4939
Fails JA, Olsen Jr DR (2003) Interactive machine learning. In: Proceedings of the 8th International Conference on Intelligent User Interfaces, pp 39–45
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst 28:91–99
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2117–2125
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 510–519
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, Y., Gao, X. & Zhang, G. Interactive object annotation based on one-click guidance. J Supercomput 79, 16098–16117 (2023). https://doi.org/10.1007/s11227-023-05279-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05279-z