Abstract
Deep neural networks based object detectors have shown great success in a variety of domains like autonomous vehicles, biomedical imaging, etc., however their success depends on the availability of a large amount of data from the domain of interest. While deep models perform well in terms of overall accuracy, they often struggle in performance on rare yet critical data slices. For e.g., detecting objects in rare data slices like “motorcycles at night” or “bicycles at night” for self-driving applications. Active learning (AL) is a paradigm to incrementally and adaptively build training datasets with a human in the loop. However, current AL based acquisition functions are not well-equipped to mine rare slices of data from large real-world datasets, since they are based on uncertainty scores or global descriptors of the image. We propose Talisman, a novel framework for Targeted Active Learning for object detectIon with rare slices using Submodular MutuAl iNformation. Our method uses the submodular mutual information functions instantiated using features of the region of interest (RoI) to efficiently target and acquire images with rare slices. We evaluate our framework on the standard PASCAL VOC07+12 [8] and BDD100K [31], a real-world large-scale driving dataset. We observe that Talisman consistently outperforms a wide range of AL methods by \(\approx \) 5%- - 14% in terms of average precision on rare slices, and \(\approx \) 2%–4% in terms of mAP. The code for Talisman is available here: https://github.com/surajkothawade/talisman.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
An unfortunate example of this is the self-driving car crash with Uber: https://www.theverge.com/2019/11/6/20951385/uber-self-driving-crash-death-reason-ntsb-dcouments where the self-driving car did not detect a pedestrian on a highway at night, resulting in a fatal accident.
- 2.
See torch.tensordot.
References
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: SODA 2007: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Ash, J.T., Zhang, C., Krishnamurthy, A., Langford, J., Agarwal, A.: Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv preprint arXiv:1906.03671 (2019)
Bach, F.: Learning with submodular functions: a convex optimization perspective. arXiv preprint arXiv:1111.6453 (2011)
Bach, F.: Submodular functions: from discrete to continuous domains. Math. Program. 175(1), 419–459 (2019)
Beck, N., Sivasubramanian, D., Dani, A., Ramakrishnan, G., Iyer, R.: Effective evaluation of deep active learning on image classification tasks. arXiv preprint arXiv:2106.15324 (2021)
Brust, C.A., Käding, C., Denzler, J.: Active learning for deep object detection. arXiv preprint arXiv:1809.09875 (2018)
Desai, S.V., Chandra, A.L., Guo, W., Ninomiya, S., Balasubramanian, V.N.: An adaptive supervision framework for active learning in object detection. arXiv preprint arXiv:1908.02454 (2019)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Fujishige, S.: Submodular functions and optimization. Elsevier (2005)
Haussmann, E., et al.: Scalable active learning for object detection. In: 2020 IEEE Intelligent Vehicles Symposium (IV), pp. 1430–1435. IEEE (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Iyer, R., Bilmes, J.: A memoization framework for scaling submodular optimization to large scale problems. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2340–2349. PMLR (2019)
Iyer, R., Khargoankar, N., Bilmes, J., Asanani, H.: Submodular combinatorial information measures with applications in machine learning. In: Algorithmic Learning Theory, pp. 722–754. PMLR (2021)
Iyer, R.K.: Submodular optimization and machine learning: Theoretical results, unifying and scalable algorithms, and applications. Ph.D. thesis (2015)
Kao, C.-C., Lee, T.-Y., Sen, P., Liu, M.-Y.: Localization-aware active learning for object detection. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11366, pp. 506–522. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20876-9_32
Kaushal, V., Iyer, R., Kothawade, S., Mahadev, R., Doctor, K., Ramakrishnan, G.: Learning from less data: a unified data subset selection and active learning framework for computer vision. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1289–1299. IEEE (2019)
Kaushal, V., Kothawade, S., Ramakrishnan, G., Bilmes, J., Iyer, R.: Prism: a unified framework of parameterized submodular information measures for targeted data subset selection and summarization. arXiv preprint arXiv:2103.00128 (2021)
Kirsch, A., Van Amersfoort, J., Gal, Y.: Batchbald: efficient and diverse batch acquisition for deep bayesian active learning. Adv. Neural. Inf. Process. Syst. 32, 7026–7037 (2019)
Kothawade, S., Beck, N., Killamsetty, K., Iyer, R.: Similar: submodular information measures based active learning in realistic scenarios. arXiv preprint arXiv:2107.00717 (2021)
Kothyari, M., Mekala, A.R., Iyer, R., Ramakrishnan, G., Jyothi, P.: Personalizing ASR with limited data using targeted subset selection. arXiv preprint arXiv:2110.04908 (2021)
Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondrák, J., Krause, A.: Lazier than lazy greedy. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Nemhauser, G.L., Wolsey, L.A., Fisher, M.L.: An analysis of approximations for maximizing submodular set functions-i. Math. Program. 14(1), 265–294 (1978)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural. Inf. Process. Syst. 28, 91–99 (2015)
Roth, D., Small, K.: Margin-based active learning for structured output spaces. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 413–424. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_40
Roy, S., Unmesh, A., Namboodiri, V.P.: Deep active learning for object detection. In: BMVC, vol. 362, p. 91 (2018)
Sener, O., Savarese, S.: Active learning for convolutional neural networks: a core-set approach. arXiv preprint arXiv:1708.00489 (2017)
Settles, B.: Active learning literature survey (2009)
Tohidi, E., Amiri, R., Coutino, M., Gesbert, D., Leus, G., Karbasi, A.: Submodularity in action: from machine learning to signal processing applications. IEEE Signal Process. Mag. 37(5), 120–133 (2020)
Wang, D., Shang, Y.: A new active labeling method for deep learning. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 112–119. IEEE (2014)
Wei, K., Iyer, R., Bilmes, J.: Submodularity in data subset selection and active learning. In: International Conference on Machine Learning, pp. 1954–1963. PMLR (2015)
Yu, F., Xian, W., Chen, Y., Liu, F., Liao, M., Madhavan, V., Darrell, T.: Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 2(5), 6 (2018)
Acknowledgments.
This work is supported by the National Science Foundation under Grant No. IIS-2106937, a startup grant from UT Dallas, and by a Google, Adobe, and Amazon research award, and an Adobe data science award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kothawade, S., Ghosh, S., Shekhar, S., Xiang, Y., Iyer, R. (2022). Talisman: Targeted Active Learning for Object Detection with Rare Classes and Slices Using Submodular Mutual Information. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13698. Springer, Cham. https://doi.org/10.1007/978-3-031-19839-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-19839-7_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19838-0
Online ISBN: 978-3-031-19839-7
eBook Packages: Computer ScienceComputer Science (R0)