Abstract
Object detection usually assumes that training and test data come from the same distribution, but the assumption is not always hold in practice. Due to domain shift problem, applying a trained detector to a new domain will lead to a great decrease in detection accuracy. Domain adaptive object detection has been adopted to maintain high detection accuracy in the face of various domain shift problems. Domain adaptive object detection methods mainly include adversarial-based methods, discrepancy-based methods, reconstruction-based methods, hybrid methods and others. Domain adaptive Faster RCNN is a classical adversarial-based method. In order to further improve the accuracy of domain adaptive object detection, we propose a method based on the Domain adaptive Faster RCNN called adaptive threshold cascade Faster RCNN (ATCFR). The ATCFR introduces the cascade strategy and adaptive threshold strategy. The cascade strategy improves the quality of bounding boxes and solves the problem of overfitting and mismatch in Faster RCNN. The adaptive threshold strategy ensures the balance of positive and negative samples and we don’t have to manually set the threshold as we did in cascade RCNN. In the end, we evaluate our new approach by using four classic datasets, including Cityscapes, Foggy Cityscapes, SIM 10k and KITTI. Experimental results show that our method has higher accuracy in variousdomain shift problems, compared with the state-of-the-art methods.
Similar content being viewed by others
References
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10486-4
Ali A, Zhu Y, Chen Q, Yu J, Cai H (2019) Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks. In: 2019 IEEE 25th International conference on parallel and distributed systems (ICPADS), IEEE. pp 125–132
Arruda VF, Paixão TM, Berriel RF, De Souza AF, Badue C, Sebe N, Oliveira-Santos T (2019) Cross-domain car detection using unsupervised image-to-image translation: From day to night. In: 2019 International joint conference on neural networks (IJCNN), IEEE. pp 1–8
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv:2004.10934
Cai Q, Pan Y, Ngo C-W, Tian X, Duan L, Yao T (2019) Exploring object relation in mean teacher for cross-domain detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11457–11466
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Cao Y, Guan D, Huang W, Yang J, Cao Y, Qiao Y (2019) Pedestrian detection with unsupervised multispectral feature learning using deep neural networks. Inf Fusion 46:206–217
Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3339–3348
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection
Felzenszwalb P, McAllester D, Ramanan D (2008) A disccle, deformable part model. In: 2008 IEEE conference on computer vision and pattern recognition, IEEE. pp 1–8
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Ganin Y, Lempitsky V (2014) Unsupervised domain adaptation by backpropagation. arXiv:1409.7495
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: The kitti dataset. Int J Robot Res 32(11):1231–1237
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Guo T, Huynh CP, Solh M (2019) Domain-adaptive pedestrian detection in thermal images. In: 2019 IEEE International conference on image processing (ICIP), IEEE, pp 1660–1664
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He Z, Zhang L (2020) Domain adaptive object detection via asymmetric tri-way faster-rcnn. arXiv:2007.01571
Hsu H-K, Yao C-H, Tsai Y-H, Hung W-C, Tseng H-Y, Singh M, Yang M-H (2020) Progressive domain adaptation for object detection. In: The IEEE winter conference on applications of computer vision, pp 749–757
Johnson-Roberson M, Barto C, Mehta R, Sridhar SN, Rosaen K, Vasudevan R (2016) Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?. arXiv:1610.01983
Khodabandeh M, Vahdat A, Ranjbar M, Macready WG (2019) A robust learning approach to domain adaptive object detection. In: Proceedings of the IEEE international conference on computer vision, pp 480–490
Li W, Li F, Luo Y, Wang P (2020) Deep domain adaptive object detection: a survey. arXiv:2002.06797
Lin C-T (2019) Cross domain adaptation for on-road object detection using multimodal structure-consistent image-to-image translation. In: 2019 IEEE International conference on image processing (ICIP), IEEE. pp 3029–3030
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer. pp 21–37
Redmon J, Farhadi A (2016) Yolo9000: Better, faster, stronger. arXiv:1612.08242
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Saito K, Ushiku Y, Harada T, Saenko K (2019) Strong-weak distribution alignment for adaptive object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6956–6965
Sakaridis C, Dai D, Van Gool L (2018) Semantic foggy scene understanding with synthetic data. Int J Comput Vis 126(9):973–992
Shan Y, Lu WF, Chew CM (2019) Pixel and feature level based domain adaptation for object detection in autonomous driving. Neurocomputing 367:31–38
Viola P, Jones M, et al. (2001) Rapid object detection using a boosted cascade of simple features. CVPR (1) 1(511-518):3
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wang T, Zhang X, Yuan L, Feng J (2019) Few-shot adaptive faster r-cnn. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7173–7182
Xu CD, Zhao XR, Jin X, Wei XS (2020) Exploring categorical regularization for domain adaptive object detection. IEEE
Xu M, Wang H, Ni B, Tian Q, Zhang W (2020) Cross-domain detection via graph-induced prototype alignment. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Zheng Y, Huang D, Liu S, Wang Y (2020) Cross-domain object detection through coarse-to-fine feature adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13766–13775
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: A survey. arXiv:1905.05055
Acknowledgements
This work was supported in part by National Natural Science Foundation of China under Grant 61802056, in part by Natural Science Foundation of Jilin Province under Grant 20180101043JC, in part by Development and Reform Committee Foundation of Jilin province of China under Grant 2019C053-9, and in part by the Open Research Fund of Key Laboratory of Space Utilization, Chinese Academy of Sciences, under Grant LSU-KFJJ-2019-08.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shi, X., Li, Z. & Yu, H. Adaptive threshold cascade faster RCNN for domain adaptive object detection. Multimed Tools Appl 80, 25291–25308 (2021). https://doi.org/10.1007/s11042-021-10917-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-10917-w