[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content

Advertisement

Log in

Adaptive learning-enhanced lightweight network for real-time vehicle density estimation

  • Research
  • Published:
The Visual Computer Aims and scope Submit manuscript

A Correction to this article was published on 27 September 2024

This article has been updated

Abstract

In order to maintain competitive density estimation performance, most of the existing works design cumbersome network structures to extract and refine vehicle features, resulting in huge computational resource consumption and storage burden during the inference process, which severely limits their deployment scope and makes it difficult to be applied in practical scenarios. To solve the above problems, we propose a lightweight network for real-time vehicle density estimation (LSENet). Specifically, the network consists of three parts: a pre-trained heavy teacher network, an adaptive integration block and a lightweight student network. First, a teacher network based on a deep single-column transformer is designed as a means to provide effective global dependency and vehicle distribution knowledge for the student network to learn. Second, to address the intermediate layer mismatch and dimensionality inconsistency between the teacher network and the student network, an adaptive integration block is designed to efficiently guide the student network learning by dynamically assigning the self-attention heads that has the most influence on the network decision as a source of distilled knowledge. Finally, to complement the fine-grained features, CNN blocks are designed in parallel with the student network transformer backbone as a way to improve the network’s ability to capture vehicle details. Extensive experiments on two vehicle benchmark datasets, TRANCOS and VisDrone2019, show that LSENet achieves an optimal trade-off between density estimation accuracy and operational speed compared to other state-of-the-art methods and is therefore suitable for deployment on computationally resource-poor edge devices. Our codes will be available at https://github.com/goudaner1/LSENet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig.2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

Change history

References

  1. Liu, M., Wang, Y., Yi, H., Huang, X.: Vehicle object counting network based on feature pyramid split attention mechanism. Vis. Comput. 40(2), 663–680 (2024). https://doi.org/10.1007/s00371-023-02808-y

    Article  MATH  Google Scholar 

  2. Li, Y., Zhang, X., Chen, D.: CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1091–1100 (2018). https://doi.org/10.1109/cvpr.2018.00120

  3. Cao, X., Wang, Z., Zhao, Y., Su, F.: Scale aggregation network for accurate and efficient crowd counting. Proceedings of the European Conference on Computer Vision, ECCV 2018, 734–750 (2018). https://doi.org/10.1007/978-3-030-01228-1_45

    Article  MATH  Google Scholar 

  4. Du, Z., Shi, M., Deng, J., Zafeiriou, S.: Redesigning multi-scale neural network for crowd counting. IEEE Trans. Image Process. (2023). https://doi.org/10.1109/tip.2023.3289290

    Article  MATH  Google Scholar 

  5. Liang, D., Xu, W., Bai, X.: An end-to-end transformer model for crowd localization. In: European Conference on Computer Vision, pp. 38–54 (2022). https://doi.org/10.1007/978-3-031-19769-7_3

  6. Hu, Y., Jiang, X., Liu, X., Zhang, B., Han, J., Cao, X., Doermann, D.: Nas-count: Counting-by-density with neural architecture search. In: European conference on computer vision, pp. 747–766 (2020). https://doi.org/10.1007/978-3-030-58542-6_45

  7. Savner, S.S., Kanhangad, V.: Crowdformer: Weakly-supervised crowd counting with improved generalizability. J. Vis. Commun. Image Represent. 94, 103853 (2023). https://doi.org/10.1016/j.jvcir.2023.103853

    Article  Google Scholar 

  8. Babu Sam, D., Surya, S., Venkatesh Babu, R.: Switching convolutional neural network for crowd counting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5744–5752 (2017). https://doi.org/10.1109/cvpr.2017.429

  9. Liang, D., Chen, X., Xu, W., Zhou, Y., Bai, X.: TransCrowd: weakly-supervised crowd counting with transformers. Sci. China Inf. Sci. 65(6), 160104 (2022). https://doi.org/10.1007/s11432-021-3445-y

    Article  Google Scholar 

  10. Jin, Y., Wu, J., Wang, W., Wang, Y., Yang, X., Zheng, J.: Dense vehicle counting estimation via a synergism attention network. Electron. 11(22), 3792 (2022). https://doi.org/10.3390/electronics11223792

    Article  MATH  Google Scholar 

  11. Gao, J., Wang, Q., Li, X.: PCC-Net: Perspective crowd counting via spatial convolutional network. IEEE Trans. Circuit. Syst. Video Technol. 30(10), 3486–3498 (2019). https://doi.org/10.1109/tcsvt.2019.2919139

    Article  Google Scholar 

  12. Shi, X., Li, X., Wu, C., Kong, S., Yang, J., He, L.: A real-time deep network for crowd counting. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2328–2332 (2020). https://doi.org/10.1109/icassp40776.2020.9053780

  13. Wang, P., Gao, C., Wang, Y., Li, H., Gao, Y.: MobileCount: An efficient encoder-decoder framework for real-time crowd counting. Neurocomputing 407, 292–299 (2020). https://doi.org/10.1016/j.neucom.2020.05.056

    Article  MATH  Google Scholar 

  14. Chen, J., Xiu, S., Chen, X., Guo, H., Xie, X.: Flounder-Net: An efficient CNN for crowd counting by aerial photography. Neurocomputing 420, 82–89 (2021). https://doi.org/10.1016/j.neucom.2020.09.001

    Article  MATH  Google Scholar 

  15. Zhu, F., Yan, H., Chen, X., Li, T.: Real-time crowd counting via lightweight scale-aware network. Neurocomputing 472, 54–67 (2022). https://doi.org/10.1016/j.neucom.2021.11.099

    Article  MATH  Google Scholar 

  16. Guo, X., Song, K., Gao, M., Zhai, W., Li, Q., Jeon, G.: Crowd counting in smart city via lightweight ghost attention pyramid network. Future Gener Comput. Syst. 147, 328–338 (2023). https://doi.org/10.1016/j.future.2023.05.013

    Article  MATH  Google Scholar 

  17. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). https:/doi.org110.48550/arxiv.1503.02531

  18. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016). https://doi.org/10.1109/cvpr.2016.91

  19. Toropov, E., Gui, L., Zhang, S., Kottur, S., Moura, J.M.: Traffic flow from a low frame rate city camera. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 3802–3806 (2015). https://doi.org/10.1109/icip.2015.7351516

  20. Chen, X., Xiang, S., Liu, C.L., Pan, C.H.: Vehicle detection in satellite images by hybrid deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 11(10), 1797–1801 (2014). https://doi.org/10.1109/acpr.2013.33

    Article  MATH  Google Scholar 

  21. Li, W., Wang, Z., Wu, X., Zhang, J., Peng, Q., Li, H.: CODAN: Counting-driven attention network for vehicle detection in congested scenes. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 73–82 (2020). https://doi.org/10.1145/3394171.3413945

  22. Wang, H., Yu, Y., Cai, Y., Chen, X., Chen, L., Liu, Q.: A comparative study of state-of-the-art deep learning algorithms for vehicle detection. IEEE Intell. Transp. Syst. Mag. 11(2), 82–95 (2019). https://doi.org/10.1109/MITS.2019.2903518

    Article  MATH  Google Scholar 

  23. Fan, Q., Brown, L., Smith, J.: A closer look at Faster R-CNN for vehicle detection. In: 2016 IEEE intelligent vehicles symposium (IV), pp. 124–129 (2016). https://doi.org/10.1109/ivs.2016.7535375

  24. Mundhenk, T.N., Konjevod, G., Sakla, W.A., Boakye, K.: A large contextual dataset for classification, detection and counting of cars with deep learning. In Computer Vision–ECCV 2016: In: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part III 14. pp. 785–800 (2016). https://doi.org/10.1007/978-3-319-46487-9_48

  25. Marsden, M., McGuinness, K., Little, S., Keogh, C.E., O'Connor, N.E.: People, penguins and petri dishes: Adapting object counting models to new visual domains and object types without forgetting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8070–8079 (2018). https://doi.org/10.1109/cvpr.2018.00842

  26. López, J.G., Agudo, A., Moreno-Noguer, F.: Vehicle pose estimation via regression of semantic points of interest. In: 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), pp. 209–214 (2019). https://doi.org/10.1109/ispa.2019.8868508

  27. Walach, E., Wolf, L.: Learning to count with cnn boosting. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part II 14. pp. 660–676 (2016). https://doi.org/10.1007/978-3-319-46475-6_41

  28. Pham, V.Q., Kozakaya, T., Yamaguchi, O., Okada, R.: Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE international conference on computer vision, pp. 3253–3261 (2015). https://doi.org/10.1109/iccv.2015.372

  29. Moreno, R., Moreno-Salinas, D., Aranda, J.: Black-box marine vehicle identification with regression techniques for random manoeuvres. Electron. 8(5), 492 (2019). https://doi.org/10.3390/electronics8050492

    Article  MATH  Google Scholar 

  30. Shang, C., Ai, H., Bai, B.: End-to-end crowd counting via joint learning local and global count. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1215–1219 (2016). https://doi.org/10.1109/icip.2016.7532551

  31. Lempitsky, V., Zisserman, A.: Learning to count objects in images. Adv. Neural Inf. Process. Syst. 23 (2010). https://doi.org/10.1117/12.2612729.6300254479001

  32. Zhang, Y., Zhou, D., Chen, S., Gao, S., Ma, Y.: Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 589–597 (2016). https://doi.org/10.1109/cvpr.2016.70

  33. Hossain, M., Hosseinzadeh, M., Chanda, O., Wang, Y.: Crowd counting using scale-aware attention networks. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp. 1280–1288 (2019). https://doi.org/10.1109/wacv.2019.00141

  34. Liu, Y.B., Jia, R.S., Liu, Q.M., Zhang, X.L., Sun, H.M.: Crowd counting method based on the self-attention residual network. Appl. Intell. 51, 427–440 (2021). https://doi.org/10.1007/s10489-020-01842-w

    Article  MATH  Google Scholar 

  35. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Houlsby, N.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020). https:/doi.org/https://doi.org/10.48550/arXiv.2010.11929

  36. Tian, Y., Chu, X., & Wang, H.: Cctrans: Simplifying and improving crowd counting with transformer. arXiv preprint arXiv:2109.14483 (2021). https:/doi.org110.48550/arxiv2109.14483

  37. Lin, H., Ma, Z., Ji, R., Wang, Y., Hong, X.: Boosting crowd counting via multifaceted attention. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 19628–19637 (2022). https://doi.org/10.1109/cvpr52688.2022.01901

  38. Li, B., Zhang, Y., Xu, H., Yin, B.: CCST: crowd counting with swin transformer. Vis. Comput. 39(7), 2671–2682 (2023). https://doi.org/10.1007/s00371-022-02485-3

    Article  MATH  Google Scholar 

  39. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014). https://doi.org/10.48550/arXiv.1412.6550

  40. Liu, L., Chen, J., Wu, H., Chen, T., Li, G., Lin, L.: Efficient crowd counting via structured knowledge transfer. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2645–2654 (2020). https://doi.org/10.1145/3394171.3413938

  41. Sun, S., Cheng, Y., Gan, Z., & Liu, J.: Patient knowledge distillation for bert model compression. arXiv preprint arXiv:1908.09355 (2019). https://doi.org/10.48550/arXiv.1908.09355

  42. Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Liu, Q.: Tinybert: Distilling bert for natural language understanding. In: Proceedings of EMNLP, pp. 2645–2654 (2019). https://doi.org/10.48550/arXiv.1909.10351

  43. Wang, W., Bao, H., Huang, S., Dong, L., Wei, F.: Minilmv2: Multi-head self-attention relation distillation for compressing pretrained transformers. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2645–2654 (2020). https://doi.org/10.18653/v1/2021

  44. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., Jégou, H.: Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pp. 10347–10357 (2021). https://doi.org/10.47611/harp.320

  45. Kwong, K., Kavaler, R., Rajagopal, R., Varaiya, P.: Real-time measurement of link vehicle count and travel time in a road network. IEEE Trans. Intell. Transp. Syst. 11(4), 814–825 (2010). https://doi.org/10.1109/tits.2010.2050881

    Article  MATH  Google Scholar 

  46. Zhao, J.D., Xu, F.F., Guo, Y.J., Gao, Y.: Traffic congestion detection based on pattern matching and correlation analysis. Adv Transp Stud 40, 27–40 (2016). https://doi.org/10.1061/9780784483053.309

    Article  MATH  Google Scholar 

  47. Voita, E., Talbot, D., Moiseev, F., Sennrich, R., Titov, I.: Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418 (2019). https://doi.org/10.18653/v1/p19-1580

  48. Horne, D., Findley, D.J., Coble, D.G., Rickabaugh, T.J., Martin, J.B.: Evaluation of radar vehicle detection at four quadrant gate rail crossings. J. Rail. Transp. Plan. Manag. 6(2), 149–162 (2016). https://doi.org/10.1016/j.jrtpm.2016.04.001

    Article  Google Scholar 

  49. Guerrero-Gómez-Olmedo, R., Torre-Jiménez, B., López-Sastre, R., Maldonado-Bascón, S., Onoro-Rubio, D.: Extremely overlapping vehicle counting. In: Pattern Recognition and Image Analysis: 7th Iberian Conference, IbPRIA 2015, Santiago de Compostela, Spain, June 17–19, 2015, Proceedings 7, pp. 423–431 (2015). https://doi.org/10.1007/978-3-319-19390-8_48

  50. Wen, L., Zhu, P., Du, D., Bian, X., Ling, H., Hu, Q., Tong, Z.: Visdrone-mot2019: The vision meets drone multiple object tracking challenge results. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019). https://doi.org/10.1109/iccvw54120.2021.00318

  51. Lin, X., Sun, S., Huang, W., Sheng, B., Li, P., Feng, D.D.: EAPT: efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 25, 50–61 (2021). https://doi.org/10.1109/tmm.2021.3120873

    Article  MATH  Google Scholar 

  52. Song, Q., Wang, C., Wang, Y., Tai, Y., Wang, C., Li, J., Wu, J., Ma, J.: To choose or to fuse? scale selection for crowd counting. In: Proceedings of the AAAI conference on artificial intelligence, pp. 2576–2583 (2021). https://doi.org/10.1609/aaai.v35i3.16360

  53. Wan, J., Liu, Z., Chan, AB.: A generalized loss function for crowd counting and localization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1974–1983 (2021). https://doi.org/10.1109/cvpr46437.2021.00201

  54. Wang, M., Cai, H., Han, X., Zhou, J., Gong, M.: STNet: Scale Tree Network with Multi-level Auxiliator for Crowd Counting. arXiv preprint arXiv:2012.10189 (2022). https://doi.org/10.1109/tmm.2022.3142398

  55. Sun, Y., Li, M., Guo, H., Zhang, L.: MSGSA: Multi-Scale Guided Self-Attention Network for Crowd Counting. Electron 12(12), 2631 (2023). https://doi.org/10.3390/electronics12122631

    Article  MATH  Google Scholar 

  56. Yu, R., Xu, X., Shen, Y.: RHNet: Lightweight dilated convolutional networks for dense objects counting. In: 2019 Chinese Control Conference (CCC), pp. 8455–8459 (2019). https://doi.org/10.23919/ChiCC.2019.8866393

  57. Liu, L., Chen, J., Wu, H., et al.: Efficient crowd counting via structured knowledge transfer. In: Proceedings of the 28th ACM international conference on multimedia, pp. 2645–2654 (2020). https://doi.org/10.1145/3394171.3413938

  58. Meng, Y., Zhang, H., Zhao, Y., et al.: Spatial uncertainty-aware semi-supervised crowd counting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15549–15559 (2021). https://doi.org/10.1109/iccv48922.2021.01526

  59. Liu, Y., Cao, G., Shi, H., Hu, Y.: Lw-count: An effective lightweight encoding-decoding crowd counting network. Trans. Circuits Syst. Video Technol. 32(10), 6821–6834 (2022). https://doi.org/10.1109/TCSVT.2022.3171235

    Article  MATH  Google Scholar 

  60. Cao, W., Zhang, Y., Gao, J., et al.: Pkd: General distillation framework for object detectors via pearson correlation coefficient. Adv. Neural. Inf. Process. Syst. 35, 15394–15406 (2022)

    Google Scholar 

  61. Khan, M.A., Menouar, H., Hamila, R.: LCDnet: a lightweight crowd density estimation model for real-time video surveillance. J. Real-Time Image Process. 20(2), 29 (2023). https://doi.org/10.1007/s11554-023-01286-8

    Article  MATH  Google Scholar 

  62. Hu, J., Han, H.: NeXtCrowd: Lightweight And Efficient Network Design for Dense Crowd Counting. In: 2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), pp. 90–97(2023). IEEE. https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00022

  63. Guo, Z., Yan, H., Li, H., et al.: Class attention transfer based knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11868–11877 (2023). https://doi.org/10.1109/cvpr52729.2023.01142

  64. Wang, R., Hao, Y., Hu, L., et al.: Efficient crowd counting via dual knowledge distillation. IEEE Trans. Image Process. 33, 569–583 (2023). https://doi.org/10.1109/TIP.2023.3343609

    Article  MATH  Google Scholar 

  65. Yi, J., Chen, F., Shen, Z., Xiang, Y., Xiao, S., Zhou, W.: An effective lightweight crowd counting method based on an encoder–decoder network for internet of video things. IEEE Internet Things J. 11(2), 3082–3094 (2024). https://doi.org/10.1109/JIOT.2023.3294727

    Article  MATH  Google Scholar 

  66. Liu, R., Wang, T., Li, H., Zhang, P., Li, J., Yang, X., Sheng, B.: TMM-Nets: transferred multi-to mono-modal generation for lupus retinopathy diagnosis. IEEE Trans. Med. Imaging 42(4), 1083–1094 (2022). https://doi.org/10.1109/tmi.2022.3223683

    Article  MATH  Google Scholar 

  67. Li, H., Zhang, J., Kong, W., Shen, J., Shao, Y.: CSA-Net: cross-modal scale-aware attention-aggregated network for RGB-T crowd counting. Expert Syst. Appl. 213, 119038 (2023). https://doi.org/10.1016/j.eswa.2022.119038

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful for collaborative funding support from the Humanities and Social Science Fund of the Ministry of Education of the People’s Republic of China (21YJAZH077) and the Natural Science Foundation of Shandong Province (ZR2022ME091).

Author information

Authors and Affiliations

Authors

Contributions

Ling-Xiao Qin involved in conceptualization, methodology, writing—original draft, software, writing—review & editing. Hong-Mei Sun took part in supervision, methodology, resources, writing—review & editing, funding acquisition. Xiao-Meng Duan took part in data curation, investigation, software, visualization. Cheng-Yue Che involved in data curation, investigation, visualization. Rui-Sheng Jia took part in supervision, methodology, funding acquisition, writing—review & editing.

Corresponding authors

Correspondence to Hong-Mei Sun or Rui-Sheng Jia.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Figure 3 was not correct.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, LX., Sun, HM., Duan, XM. et al. Adaptive learning-enhanced lightweight network for real-time vehicle density estimation. Vis Comput 41, 2857–2873 (2025). https://doi.org/10.1007/s00371-024-03572-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03572-3

Keywords

Navigation