Abstract
In order to maintain competitive density estimation performance, most of the existing works design cumbersome network structures to extract and refine vehicle features, resulting in huge computational resource consumption and storage burden during the inference process, which severely limits their deployment scope and makes it difficult to be applied in practical scenarios. To solve the above problems, we propose a lightweight network for real-time vehicle density estimation (LSENet). Specifically, the network consists of three parts: a pre-trained heavy teacher network, an adaptive integration block and a lightweight student network. First, a teacher network based on a deep single-column transformer is designed as a means to provide effective global dependency and vehicle distribution knowledge for the student network to learn. Second, to address the intermediate layer mismatch and dimensionality inconsistency between the teacher network and the student network, an adaptive integration block is designed to efficiently guide the student network learning by dynamically assigning the self-attention heads that has the most influence on the network decision as a source of distilled knowledge. Finally, to complement the fine-grained features, CNN blocks are designed in parallel with the student network transformer backbone as a way to improve the network’s ability to capture vehicle details. Extensive experiments on two vehicle benchmark datasets, TRANCOS and VisDrone2019, show that LSENet achieves an optimal trade-off between density estimation accuracy and operational speed compared to other state-of-the-art methods and is therefore suitable for deployment on computationally resource-poor edge devices. Our codes will be available at https://github.com/goudaner1/LSENet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the findings of this study are available on request from the corresponding author upon reasonable request.
Change history
27 September 2024
A Correction to this paper has been published: https://doi.org/10.1007/s00371-024-03662-2
References
Liu, M., Wang, Y., Yi, H., Huang, X.: Vehicle object counting network based on feature pyramid split attention mechanism. Vis. Comput. 40(2), 663–680 (2024). https://doi.org/10.1007/s00371-023-02808-y
Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1091–1100 (2018). https://doi.org/10.1109/cvpr.2018.00120
Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision, ECCV 2018, 734–750 (2018). https://doi.org/10.1007/978-3-030-01228-1_45
Du, Z., Shi, M., Deng, J., Zafeiriou, S.: Redesigning multi-scale neural network for crowd counting. IEEE Trans. Image Process. (2023). https://doi.org/10.1109/tip.2023.3289290
Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: European Conference on Computer Vision, pp. 38–54 (2022). https://doi.org/10.1007/978-3-031-19769-7_3
Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., Doermann, D.: Nas-count: Counting-by-density with neural architecture search. In: European conference on computer vision, pp. 747–766 (2020). https://doi.org/10.1007/978-3-030-58542-6_45
Savner, S.S., Kanhangad, V.: Crowdformer: Weakly-supervised crowd counting with improved generalizability. J. Vis. Commun. Image Represent. 94, 103853 (2023). https://doi.org/10.1016/j.jvcir.2023.103853
Babu Sam, D., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5744–5752 (2017). https://doi.org/10.1109/cvpr.2017.429
Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 160104 (2022). https://doi.org/10.1007/s11432-021-3445-y
Jin, Y., Wu, J., Wang, W., Wang, Y., Yang, X., Zheng, J.: Dense vehicle counting estimation via a synergism attention network. Electron. 11(22), 3792 (2022). https://doi.org/10.3390/electronics11223792
Gao, J., Wang, Q., Li, X.: PCC-Net: Perspective crowd counting via spatial convolutional network. IEEE Trans. Circuit. Syst. Video Technol. 30(10), 3486–3498 (2019). https://doi.org/10.1109/tcsvt.2019.2919139
Shi, X., Li, X., Wu, C., Kong, S., Yang, J., He, L.: A real-time deep network for crowd counting. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2328–2332 (2020). https://doi.org/10.1109/icassp40776.2020.9053780
Wang, P., Gao, C., Wang, Y., Li, H., Gao, Y.: MobileCount: An efficient encoder-decoder framework for real-time crowd counting. Neurocomputing 407, 292–299 (2020). https://doi.org/10.1016/j.neucom.2020.05.056
Chen, J., Xiu, S., Chen, X., Guo, H., Xie, X.: Flounder-Net: An efficient CNN for crowd counting by aerial photography. Neurocomputing 420, 82–89 (2021). https://doi.org/10.1016/j.neucom.2020.09.001
Zhu, F., Yan, H., Chen, X., Li, T.: Real-time crowd counting via lightweight scale-aware network. Neurocomputing 472, 54–67 (2022). https://doi.org/10.1016/j.neucom.2021.11.099
Guo, X., Song, K., Gao, M., Zhai, W., Li, Q., Jeon, G.: Crowd counting in smart city via lightweight ghost attention pyramid network. Future Gener Comput. Syst. 147, 328–338 (2023). https://doi.org/10.1016/j.future.2023.05.013
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). https:/doi.org110.48550/arxiv.1503.02531
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016). https://doi.org/10.1109/cvpr.2016.91
Toropov, E., Gui, L., Zhang, S., Kottur, S., Moura, J.M.: Traffic flow from a low frame rate city camera. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 3802–3806 (2015). https://doi.org/10.1109/icip.2015.7351516
Chen, X., Xiang, S., Liu, C.L., Pan, C.H.: Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 11(10), 1797–1801 (2014). https://doi.org/10.1109/acpr.2013.33
Li, W., Wang, Z., Wu, X., Zhang, J., Peng, Q., Li, H.: CODAN: Counting-driven attention network for vehicle detection in congested scenes. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 73–82 (2020). https://doi.org/10.1145/3394171.3413945
Wang, H., Yu, Y., Cai, Y., Chen, X., Chen, L., Liu, Q.: A comparative study of state-of-the-art deep learning algorithms for vehicle detection. IEEE Intell. Transp. Syst. Mag. 11(2), 82–95 (2019). https://doi.org/10.1109/MITS.2019.2903518
Fan, Q., Brown, L., Smith, J.: A closer look at Faster R-CNN for vehicle detection. In: 2016 IEEE intelligent vehicles symposium (IV), pp. 124–129 (2016). https://doi.org/10.1109/ivs.2016.7535375
Mundhenk, T.N., Konjevod, G., Sakla, W.A., Boakye, K.: A large contextual dataset for classification, detection and counting of cars with deep learning. In Computer Vision–ECCV 2016: In: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14. pp. 785–800 (2016). https://doi.org/10.1007/978-3-319-46487-9_48
Marsden, M., McGuinness, K., Little, S., Keogh, C.E., O'Connor, N.E.: People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8070–8079 (2018). https://doi.org/10.1109/cvpr.2018.00842
López, J.G., Agudo, A., Moreno-Noguer, F.: Vehicle pose estimation via regression of semantic points of interest. In: 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 209–214 (2019). https://doi.org/10.1109/ispa.2019.8868508
Walach, E., Wolf, L.: Learning to count with cnn boosting. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. pp. 660–676 (2016). https://doi.org/10.1007/978-3-319-46475-6_41
Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 3253–3261 (2015). https://doi.org/10.1109/iccv.2015.372
Moreno, R., Moreno-Salinas, D., Aranda, J.: Black-box marine vehicle identification with regression techniques for random manoeuvres. Electron. 8(5), 492 (2019). https://doi.org/10.3390/electronics8050492
Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1215–1219 (2016). https://doi.org/10.1109/icip.2016.7532551
Lempitsky, V., Zisserman, A.: Learning to count objects in images. Adv. Neural Inf. Process. Syst. 23 (2010). https://doi.org/10.1117/12.2612729.6300254479001
Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 589–597 (2016). https://doi.org/10.1109/cvpr.2016.70
Hossain, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp. 1280–1288 (2019). https://doi.org/10.1109/wacv.2019.00141
Liu, Y.B., Jia, R.S., Liu, Q.M., Zhang, X.L., Sun, H.M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51, 427–440 (2021). https://doi.org/10.1007/s10489-020-01842-w
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). https:/doi.org/https://doi.org/10.48550/arXiv.2010.11929
Tian, Y., Chu, X., & Wang, H.: Cctrans: Simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021). https:/doi.org110.48550/arxiv2109.14483
Lin, H., Ma, Z., Ji, R., Wang, Y., Hong, X.: Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 19628–19637 (2022). https://doi.org/10.1109/cvpr52688.2022.01901
Li, B., Zhang, Y., Xu, H., Yin, B.: CCST: crowd counting with swin transformer. Vis. Comput. 39(7), 2671–2682 (2023). https://doi.org/10.1007/s00371-022-02485-3
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014). https://doi.org/10.48550/arXiv.1412.6550
Liu, L., Chen, J., Wu, H., Chen, T., Li, G., Lin, L.: Efficient crowd counting via structured knowledge transfer. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2645–2654 (2020). https://doi.org/10.1145/3394171.3413938
Sun, S., Cheng, Y., Gan, Z., & Liu, J.: Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355 (2019). https://doi.org/10.48550/arXiv.1908.09355
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Liu, Q.: Tinybert: Distilling bert for natural language understanding. In: Proceedings of EMNLP, pp. 2645–2654 (2019). https://doi.org/10.48550/arXiv.1909.10351
Wang, W., Bao, H., Huang, S., Dong, L., Wei, F.: Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2645–2654 (2020). https://doi.org/10.18653/v1/2021
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357 (2021). https://doi.org/10.47611/harp.320
Kwong, K., Kavaler, R., Rajagopal, R., Varaiya, P.: Real-time measurement of link vehicle count and travel time in a road network. IEEE Trans. Intell. Transp. Syst. 11(4), 814–825 (2010). https://doi.org/10.1109/tits.2010.2050881
Zhao, J.D., Xu, F.F., Guo, Y.J., Gao, Y.: Traffic congestion detection based on pattern matching and correlation analysis. Adv Transp Stud 40, 27–40 (2016). https://doi.org/10.1061/9780784483053.309
Voita, E., Talbot, D., Moiseev, F., Sennrich, R., Titov, I.: Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418 (2019). https://doi.org/10.18653/v1/p19-1580
Horne, D., Findley, D.J., Coble, D.G., Rickabaugh, T.J., Martin, J.B.: Evaluation of radar vehicle detection at four quadrant gate rail crossings. J. Rail. Transp. Plan. Manag. 6(2), 149–162 (2016). https://doi.org/10.1016/j.jrtpm.2016.04.001
Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Onoro-Rubio, D.: Extremely overlapping vehicle counting. In: Pattern Recognition and Image Analysis: 7th Iberian Conference, IbPRIA 2015, Santiago de Compostela, Spain, June 17–19, 2015, Proceedings 7, pp. 423–431 (2015). https://doi.org/10.1007/978-3-319-19390-8_48
Wen, L., Zhu, P., Du, D., Bian, X., Ling, H., Hu, Q., Tong, Z.: Visdrone-mot2019: The vision meets drone multiple object tracking challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019). https://doi.org/10.1109/iccvw54120.2021.00318
Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2021). https://doi.org/10.1109/tmm.2021.3120873
Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., Wu, J., Ma, J.: To choose or to fuse? scale selection for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence, pp. 2576–2583 (2021). https://doi.org/10.1609/aaai.v35i3.16360
Wan, J., Liu, Z., Chan, AB.: A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1974–1983 (2021). https://doi.org/10.1109/cvpr46437.2021.00201
Wang, M., Cai, H., Han, X., Zhou, J., Gong, M.: STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting. arXiv preprint arXiv:2012.10189 (2022). https://doi.org/10.1109/tmm.2022.3142398
Sun, Y., Li, M., Guo, H., Zhang, L.: MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting. Electron 12(12), 2631 (2023). https://doi.org/10.3390/electronics12122631
Yu, R., Xu, X., Shen, Y.: RHNet: Lightweight dilated convolutional networks for dense objects counting. In: 2019 Chinese Control Conference (CCC), pp. 8455–8459 (2019). https://doi.org/10.23919/ChiCC.2019.8866393
Liu, L., Chen, J., Wu, H., et al.: Efficient crowd counting via structured knowledge transfer. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2645–2654 (2020). https://doi.org/10.1145/3394171.3413938
Meng, Y., Zhang, H., Zhao, Y., et al.: Spatial uncertainty-aware semi-supervised crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15549–15559 (2021). https://doi.org/10.1109/iccv48922.2021.01526
Liu, Y., Cao, G., Shi, H., Hu, Y.: Lw-count: An effective lightweight encoding-decoding crowd counting network. Trans. Circuits Syst. Video Technol. 32(10), 6821–6834 (2022). https://doi.org/10.1109/TCSVT.2022.3171235
Cao, W., Zhang, Y., Gao, J., et al.: Pkd: General distillation framework for object detectors via pearson correlation coefficient. Adv. Neural. Inf. Process. Syst. 35, 15394–15406 (2022)
Khan, M.A., Menouar, H., Hamila, R.: LCDnet: a lightweight crowd density estimation model for real-time video surveillance. J. Real-Time Image Process. 20(2), 29 (2023). https://doi.org/10.1007/s11554-023-01286-8
Hu, J., Han, H.: NeXtCrowd: Lightweight And Efficient Network Design for Dense Crowd Counting. In: 2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pp. 90–97(2023). IEEE. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00022
Guo, Z., Yan, H., Li, H., et al.: Class attention transfer based knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11868–11877 (2023). https://doi.org/10.1109/cvpr52729.2023.01142
Wang, R., Hao, Y., Hu, L., et al.: Efficient crowd counting via dual knowledge distillation. IEEE Trans. Image Process. 33, 569–583 (2023). https://doi.org/10.1109/TIP.2023.3343609
Yi, J., Chen, F., Shen, Z., Xiang, Y., Xiao, S., Zhou, W.: An effective lightweight crowd counting method based on an encoder–decoder network for internet of video things. IEEE Internet Things J. 11(2), 3082–3094 (2024). https://doi.org/10.1109/JIOT.2023.3294727
Liu, R., Wang, T., Li, H., Zhang, P., Li, J., Yang, X., Sheng, B.: TMM-Nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis. IEEE Trans. Med. Imaging 42(4), 1083–1094 (2022). https://doi.org/10.1109/tmi.2022.3223683
Li, H., Zhang, J., Kong, W., Shen, J., Shao, Y.: CSA-Net: cross-modal scale-aware attention-aggregated network for RGB-T crowd counting. Expert Syst. Appl. 213, 119038 (2023). https://doi.org/10.1016/j.eswa.2022.119038
Acknowledgements
The authors are grateful for collaborative funding support from the Humanities and Social Science Fund of the Ministry of Education of the People’s Republic of China (21YJAZH077) and the Natural Science Foundation of Shandong Province (ZR2022ME091).
Author information
Authors and Affiliations
Contributions
Ling-Xiao Qin involved in conceptualization, methodology, writing—original draft, software, writing—review & editing. Hong-Mei Sun took part in supervision, methodology, resources, writing—review & editing, funding acquisition. Xiao-Meng Duan took part in data curation, investigation, software, visualization. Cheng-Yue Che involved in data curation, investigation, visualization. Rui-Sheng Jia took part in supervision, methodology, funding acquisition, writing—review & editing.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this article was revised: Figure 3 was not correct.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qin, LX., Sun, HM., Duan, XM. et al. Adaptive learning-enhanced lightweight network for real-time vehicle density estimation. Vis Comput 41, 2857–2873 (2025). https://doi.org/10.1007/s00371-024-03572-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-024-03572-3