More Web Proxy on the site http://driver.im/

research-article

PointerNet: Spatiotemporal Modeling for Crowd Counting in Videos

Authors:

Changsheng Liu,

Xiaoming YuAuthors Info & Claims

ICDLT '21: Proceedings of the 2021 5th International Conference on Deep Learning Technologies

Pages 26 - 31

https://doi.org/10.1145/3480001.3480018

Published: 12 November 2021 Publication History

Abstract

The existing video crowd counting methods via deep learning technique are mainly involved in how to leverage the temporal correlation to improve the model. Studies have shown that convolutional neural networks with spatiotemporal three-dimensional kernels (3D CNNs) are promising architectures on video crowd counting. However, the existing methods based on 3D CNNs are insufficient for very deep neural networks in 2D-based CNNs owing to their considerable number of parameters and lack of labeled data, which gives rise to overfitting of 3D CNNs and results in an unsatisfying video crowd counting performance. To address this issue, a novel end-to-end video crowd counting framework, named PointerNet (PseudO-3D (P3D) CNNs INtegrated with Temporal channEl-awaRe (TCA) block) is proposed. The use of P3D kernels causes our framework to possess greater structural diversity and go deep, while having a limited computational cost and memory demand. In addition, the temporal context-aware block was proposed and integrated into our architecture, which assists in exploiting the temporal interdependencies among video sequences. Experiments on three benchmark datasets indicates that the proposed method delivers a state-of-the-art performance.

References

[1]

Liu, X., Weijer, J. v. d. and Bagdanov, A. D. 2018. Leveraging unlabeled data for crowd counting by learning to rank. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 7661–7669.

[2]

Chan, A. B., Liang, Z.-S. J. and Vasconcelos, N. 2008. Privacy preserving crowd monitoring: Counting people without people models or tracking. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Anchorage, AK, USA, 1, 1–7.

[3]

Zhang, C., Li, H., Wang, X. and Yang, X. 2015. Cross-scene crowd counting via deep convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Boston, MA, USA, 833–841.

[4]

Zhou, B., Wang, X. and Tang, X. 2012. Understanding collective crowd behaviors:learning a mixture model of dynamic pedestrian-agents. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Providence, RI, USA, 2871–2878.

[5]

Sheng, B., Shen, C., Lin, G., Li, J., Yang, W. and Sun, C. 2016. Crowd counting via weighted vlad on dense attribute feature maps. IEEE Transactions on Circuits and Systems for Video Technology, 28(8), 1788–1797.

Digital Library

[6]

Chen, J., Liang, J., Lu, H., Yu, S.-I. and Hauptmann, A. 2016. Videos from the 2013 Boston marathon: An event reconstruction dataset for synchronization and localization. Carnegie Mellon University, https://doi.org/10.1184/R1/6473834.v1

[7]

Ge, W. and Collins, R. T. 2009. Marked point processes for crowd counting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Miami, FL, USA, 2913–2920.

[8]

Leibe, B., Seemann, E. and Schiele, B. 2005. Pedestrian detection in crowded scenes. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05). IEEE, San Diego, CA, USA, 1, 878–885.

[9]

Felzenszwalb, P. F., Girshick, R. B., McAllester, D. and Ramanan, D. 2014. Object detection with discriminatively trained part-based models. Computer, 47(2), 6–7.

Digital Library

[10]

Viola, P., Jones, M. J. and Snow, D. 2005. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, 63(2), 153–161.

Digital Library

[11]

Zhao, T., Nevatia, R. and Wu, B. 2008. Segmentation and tracking of multiple humans in crowded environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1198–1211.

Digital Library

[12]

Stewart, R., Andriluka, M. and Ng, A. Y. 2016. End-to-end people detection in crowded scenes. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 2325–2333.

[13]

Ren, S., He, K., Girshick, R. and Sun, J. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Montreal, Canada, 1, 91–99.

[14]

Vu, T., Osokin, A. and Laptev, I. 2015. Context-aware CNNs for person head detection. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 2893–2901.

[15]

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R. and LeCun, Y. 2014. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229.

[16]

Boominathan, L., Kruthiventi, S. S. S. and Babu, R. V. 2016. CrowdNet: A deep convolutional network for dense crowd counting. In Proceedings of the 24th ACM international conference on Multimedia. Association for Computing Machinery, Amsterdam, The Netherlands, 640–644.

[17]

Sam, D., Surya, S. and Babu, R. 2017. Switching convolutional neural network for crowd counting. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 4031–4039.

[18]

Liu, W., Salzmann, M. and Fua, P. 2019. Context-aware crowd counting. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 5099–5108.

[19]

Pham, V., Kozakaya, T., Yamaguchi, O. and Okada, R. 2015. Count forest: Co-voting uncertain number of targets using random forest for crowd density estimation. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 3253–3261.

[20]

Xu, B. and Qiu, G. 2016. Crowd density estimation based on rich features and random projection forest. In 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, Lake Placid, NY, USA, 1, 1–8.

[21]

Lempitsky, V. and Zisserman, A. 2010. Learning to count objects in images. In Proceedings of the 23rd International Conference on Neural Information Processing Systems - Volume 1. Curran Associates Inc., Vancouver, British Columbia, Canada, 1324–1332.

[22]

Miao, Y., Han, J., Gao, Y. and Zhang, B. 2019. ST-CNN: Spatial-temporal convolutional neural network for crowd counting in videos. Pattern Recognition Letters, 125, 113–118.

Digital Library

[23]

Fang, Y., Zhan, B., Cai, W., Gao, S. and Hu, B. 2019. Locality-constrained spatial transformer network for video crowd counting. In 2019 IEEE International Conference on Multimedia and Expo (ICME). IEEE, Shanghai, China, 814–819.

[24]

Fang, Y., Gao, S., Li, J., Luo, W., He, L. and Hu, B. 2020. Multi-level feature fusion based locality-constrained spatial transformer network for video crowd counting. Neurocomputing, 392, 98–107.

[25]

Xiong, F., Shi, X. and Yeung, D. 2017. Spatiotemporal modeling for crowd counting in videos. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, Italy, 5161–5169.

[26]

Liu, W., Salzmann, M. and Fua, P. 2020. Estimating people flows to better count them in crowded scenes. In European Conference on Computer Vision (ECCV). Springer, Online event, 723–740.

[27]

Wu, X., Xu, B., Zheng, Y., Ye, H., Yang, J. and He, L. 2020. Fast video crowd counting with a temporal aware network. Neurocomputing, 403, 13–20.

[28]

Zou, Z., Shao, H., Qu, X., Wei, W. and Zhou, P. 2019. Enhanced 3D convolutional networks for crowd counting. arXiv preprint arXiv:1908.04121.

[29]

Ji, S., Xu, W., Yang, M. and Yu, K. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.

Digital Library

[30]

Tran, D., Bourdev, L., Fergus, R., Torresani, L. and Paluri, M. 2015. Learning spatiotemporal features with 3D convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE, Santiago, Chile, 4489–4497.

[31]

Carreira, J. and Zisserman, A. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 4724–4733.

[32]

Qiu, Z., Yao, T. and Mei, T. 2017. Learning spatio-temporal representation with pseudo-3D residual networks. In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Venice, Italy, 5534–5542.

[33]

Zhang, X., Li, Z., Loy, C. C. and Lin, D. 2017. PolyNet: A pursuit of structural diversity in very deep networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Honolulu, HI, USA, 3900–3908.

[34]

Hu, J., Shen, L. and Sun, G. 2020. Squeeze-and-excitation networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8), 2011–2023.

Digital Library

[35]

Zhang, Y., Zhou, D., Chen, S., Gao, S. and Ma, Y. 2016. Single-image crowd counting via multi-column convolutional neural network. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA 589–597.

[36]

Chen, K., Loy, C. C., Gong, S. and Xiang, T. 2012. Feature mining for localised crowd counting. In Proceedings British Machine Vision Conference (BMVC). BMVA Press, Guildford, Surrey, U.K., 21.1–21.11.

[37]

Oñoro, D. and López-Sastre, R. 2016. Towards perspective-free object counting with deep learning. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016, LNCS, Springer, Cham, 9911, 615–629.

[38]

Liu, L., Wang, H., Li, G., Ouyang, W. and Lin, L. 2018. Crowd counting using deep recurrent spatial-aware network. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, Stockholm, Sweden, 849–855.

[39]

Kumagai, S., Hotta, K. and Kurita, T. 2017. Mixture of counting CNNs: Adaptive integration of cnns specialized to specific appearance for crowd counting. arXiv preprint arXiv:1703.09393.

[40]

Zou, Z., Liu, Y., Xu, S., Wei, W., Wen, S. and Zhou, P. 2020. Crowd counting via hierarchical scale recalibration network. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Seattle, USA, 2864-2871.

[41]

Shi, M., Yang, Z. and Chen, Q. 2019. Revisiting perspective information for efficient crowd counting. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Long Beach, CA, USA, 7271–7280.

[42]

Fan, H. and Ling, H. 2017. Sanet: Structure-aware network for visual tracking. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, Honolulu, HI, USA, 2217–2224.

[43]

Li, Y., Zhang, X. and Chen, D. 2018. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. USA, Salt Lake City, UT, USA, 1091–1100.

Cited By

Index Terms

PointerNet: Spatiotemporal Modeling for Crowd Counting in Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems
  1. Information systems applications

Index terms have been assigned to the content through auto-classification.

Recommendations

A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting
AIPR '22: Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition

For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to ...
Pyramid-dilated deep convolutional neural network for crowd counting
Abstract
Statistics on crowds in crowded scenes can reflect the density level of crowds and provide safety warnings. This is a laborious task if conducted manually. In recent years, automated crowd counting has received extensive attention in the computer ...
Multi-Dilation Network for Crowd Counting
MMAsia '19: Proceedings of the 1st ACM International Conference on Multimedia in Asia

With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICDLT '21: Proceedings of the 2021 5th International Conference on Deep Learning Technologies

July 2021

131 pages

ISBN:9781450390163

DOI:10.1145/3480001

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICDLT 2021

ICDLT 2021: 2021 5th International Conference on Deep Learning Technologies

July 23 - 25, 2021

Qingdao, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
66
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents