[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3343031.3350898acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article
Open access

Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

Published: 15 October 2019 Publication History

Abstract

Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works extensively employ Convolutional Neural Networks with multi-column structures to handle different scales and resolutions. However, due to the substantial redundant parameters in columns, existing multi-column networks invariably exhibit almost the same scale features in different columns, which severely affects counting accuracy and leads to overfitting. In this paper, we attack this problem by proposing a novel Multicolumn Mutual Learning (McML) strategy. It has two main innovations: 1) A statistical network is incorporated into the multi-column framework to estimate the mutual information between columns, which can approximately indicate the scale correlation between features from different columns. By minimizing the mutual information, each column is guided to learn features with different image scales. 2) We devise a mutual learning scheme that can alternately optimize each column while keeping the other columns fixed on each mini-batch training data. With such asynchronous parameter update process, each column is inclined to learn different feature representation from others, which can efficiently reduce the parameter redundancy and improve generalization ability. More remarkably, McML can be applied to all existing multi-column networks and is end-to-end trainable. Extensive experiments on four challenging benchmarks show that McML can significantly improve the original multi-column networks and outperform the other state-of-the-art approaches.

References

[1]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep?. In Advances in neural information processing systems. 2654--2662.
[2]
Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeswar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, and R Devon Hjelm. 2018. Mine: mutual information neural estimation. arXiv preprint arXiv:1801.04062 (2018).
[3]
Lokesh Boominathan, Srinivas S. S. Kruthiventi, and R. Venkatesh Babu. 2016. CrowdNet: A Deep Convolutional Network for Dense Crowd Counting. In Proceedings of ACM International Conference on Multimedia. 640--644.
[4]
Gabriel J Brostow and Roberto Cipolla. 2006. Unsupervised bayesian detection of independent motion in crowds. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 594--601.
[5]
Gavin Brown, Jeremy L Wyatt, and Peter Tivn o. 2005. Managing diversity in regression ensembles. Journal of Machine Learning Research, Vol. 6, Sep (2005), 1621--1650.
[6]
Atul J Butte and Isaac S Kohane. 1999. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. In Biocomputing . 418--429.
[7]
Xinkun Cao, Zhipeng Wang, Yanyun Zhao, and Fei Su. 2018. Scale Aggregation Network for Accurate and Efficient Crowd Counting. In Proceedings of European Conference on Computer Vision. 757--773.
[8]
Antoni B Chan, Zhang-Sheng John Liang, and Nuno Vasconcelos. 2008. Privacy preserving crowd monitoring: Counting people without people models or tracking. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--7.
[9]
Antoni B Chan and Nuno Vasconcelos. 2012. Counting people with low-level features and Bayesian regression. IEEE Transactions on Image Processing, Vol. 21, 4 (2012), 2160--2177.
[10]
Ke Chen, Shaogang Gong, Tao Xiang, and Chen Change Loy. 2013. Cumulative attribute space for age and crowd density estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition . 2467--2474.
[11]
Ke Chen, Chen Change Loy, Shaogang Gong, and Tony Xiang. 2012. Feature Mining for Localised Crowd Counting. In Proceedings of British Machine Vision Conference . 1--11.
[12]
Zhi-Qi Cheng, Jun-Xiu Li, Qi Dai, Xiao Wu, and Alexander Hauptmann. 2019. Learning Spatial Awareness to Improve Crowd Counting. In Proceedings of IEEE International Conference on Computer Vision .
[13]
Zhi-Qi Cheng, Xiao Wu, Siyu Huang, Jun-Xiu Li, Alexander G. Hauptmann, and Qiang Peng. 2018. Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search. In Proceedings of the 26th ACM International Conference on Multimedia .
[14]
Zhi-Qi Cheng, Xiao Wu, Yang Liu, and Xian-Sheng Hua. 2017a. Video2shop: Exact matching clothes in videos to online shopping images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 4048--4056.
[15]
Zhi-Qi Cheng, Hao Zhang, Xiao Wu, and Chong-Wah Ngo. 2017b. On the selection of anchors and targets for video hyperlinking. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval . 287--293.
[16]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. 886--893.
[17]
Piotr Dollár, Boris Babenko, Serge Belongie, Pietro Perona, and Zhuowen Tu. 2008. Multiple component learning for object detection. In Proceedings of European Conference on Computer Vision. 211--224.
[18]
Monroe D Donsker and SR Srinivasa Varadhan. 1983. Asymptotic evaluation of certain Markov process expectations for large time. IV. Communications on Pure and Applied Mathematics, Vol. 36, 2 (1983), 183--212.
[19]
Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, Vol. 15, 1 (2014), 3133--3181.
[20]
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Adam Trischler, and Yoshua Bengio. 2018. Learning deep representations by mutual information estimation and maximization. arXiv preprint arXiv:1808.06670 (2018).
[21]
Siyu Huang, Xi Li, Zhiqi Cheng, Zhongfei Zhang, and Alexander G. Hauptmann. 2018a. Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance. CoRR, Vol. abs/1808.07456 (2018).
[22]
Siyu Huang, Xi Li, Zhi-Qi Cheng, Zhongfei Zhang, and Alexander Hauptmann. 2018b. GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 2049--2057.
[23]
Siyu Huang, Xi Li, Zhongfei Zhang, Fei Wu, Shenghua Gao, Rongrong Ji, and Junwei Han. 2018c. Body Structure Aware Deep Crowd Counting. IEEE Trans. Image Processing, Vol. 27, 3 (2018), 1049--1059.
[24]
Haroon Idrees, Imran Saleemi, Cody Seibert, and Mubarak Shah. 2013. Multi-source Multi-scale Counting in Extremely Dense Crowd Images. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2547--2554.
[25]
Haroon Idrees, Khurram Soomro, and Mubarak Shah. 2015. Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE transactions on pattern analysis and machine intelligence, Vol. 37, 10 (2015), 1986--1998.
[26]
Di Kang and Antoni B. Chan. 2018. Crowd Counting by Adaptively Fusing Predictions from an Image Pyramid. In Proceedings of British Machine Vision Conference. 89.
[27]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[28]
Nojun Kwak and Chong-Ho Choi. 2002. Input feature selection by mutual information based on Parzen window. IEEE Transactions on Pattern Analysis & Machine Intelligence 12 (2002), 1667--1671.
[29]
Victor S. Lempitsky and Andrew Zisserman. 2010. Learning To Count Objects in Images. In Proceedings of Conference on Neural Information Processing Systems. 1324--1332.
[30]
Yuhong Li, Xiaofan Zhang, and Deming Chen. 2018. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 1091--1100.
[31]
Sheng-Fuu Lin, Jaw-Yeh Chen, and Hung-Xin Chao. 2001. Estimation of number of people in crowded scenes using perspective transformation. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Vol. 31, 6 (2001), 645--654.
[32]
Jiang Liu, Chenqiang Gao, Deyu Meng, and Alexander G. Hauptmann. 2018a. DecideNet: Counting Varying Density Crowds Through Attention Guided Detection and Density Estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5197--5206.
[33]
Ning Liu, Yongchao Long, Changqing Zou, Qun Niu, Li Pan, and Hefeng Wu. 2018c. ADCrowdNet: An Attention-injective Deformable Convolutional Network for Crowd Understanding. CoRR, Vol. abs/1811.11968 (2018).
[34]
Weizhe Liu, Krzysztof Lis, Mathieu Salzmann, and Pascal Fua. 2018b. Geometric and Physical Constraints for Head Plane Crowd Density Estimation in Videos. CoRR, Vol. abs/1803.08805 (2018).
[35]
Weizhe Liu, Mathieu Salzmann, and Pascal Fua. 2018 d. Context-Aware Crowd Counting. CoRR, Vol. abs/1811.10452 (2018).
[36]
Zheng Ma and Antoni B. Chan. 2013. Crossing the Line: Crowd Counting by Integer Programming with Local Features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 2539--2546.
[37]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech and Language Processing .
[38]
Frederik Maes, Andre Collignon, Dirk Vandermeulen, Guy Marchal, and Paul Suetens. 1997. Multimodality image registration by maximization of mutual information. IEEE transactions on Medical Imaging, Vol. 16, 2 (1997), 187--198.
[39]
Daniel O n oro-Rubio and Roberto Javier Ló pez-Sastre. 2016. Towards Perspective-Free Object Counting with Deep Learning. In Proceedings of European Conference on Computer Vision. 615--629.
[40]
Liam Paninski. 2003. Estimation of entropy and mutual information. Neural computation, Vol. 15, 6 (2003), 1191--1253.
[41]
Nikos Paragios and Visvanathan Ramesh. 2001. A MRF-based approach for real-time subway monitoring. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. I--I.
[42]
Hanchuan Peng, Fuhui Long, and Chris Ding. 2005. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence 8 (2005), 1226--1238.
[43]
Viet-Quoc Pham, Tatsuo Kozakaya, Osamu Yamaguchi, and Ryuzo Okada. 2015. COUNT Forest: CO-Voting Uncertain Number of Targets Using Random Forest for Crowd Density Estimation. In Proceedings of International Conference on Computer Vision. 3253--3261.
[44]
Viresh Ranjan, Hieu Le, and Minh Hoai. 2018. Iterative Crowd Counting. In Proceedings of European Conference on Computer Vision. 278--293.
[45]
Carlo S Regazzoni and Alessandra Tesei. 1996. Distributed data fusion for real-time crowding estimation. Signal Processing, Vol. 53, 1 (1996), 47--63.
[46]
Ye Ren, Le Zhang, and Ponnuthurai N Suganthan. 2016. Ensemble classification and regression-recent developments, applications and future directions. IEEE Computational intelligence magazine, Vol. 11, 1 (2016), 41--53.
[47]
David Ryan, Simon Denman, Clinton Fookes, and Sridha Sridharan. 2009. Crowd counting using multiple local features. In Digital Image Computing: Techniques and Applications. 81--88.
[48]
Deepak Babu Sam and R. Venkatesh Babu. 2018. Top-Down Feedback for Crowd Counting Convolutional Neural Network. In Proceedings of Conference on Artificial Intelligence. 7323--7330.
[49]
Deepak Babu Sam, Neeraj N. Sajjan, R. Venkatesh Babu, and Mukundhan Srinivasan. 2018. Divide and Grow: Capturing Huge Diversity in Crowd Images With Incrementally Growing CNN. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 3618--3626.
[50]
Deepak Babu Sam, Shiv Surya, and R. Venkatesh Babu. 2017. Switching Convolutional Neural Network for Crowd Counting. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 4031--4039.
[51]
Zan Shen, Yi Xu, Bingbing Ni, Minsi Wang, Jianguo Hu, and Xiaokang Yang. 2018. Crowd Counting via Adversarial Cross-Scale Consistency Pursuit. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 5245--5254.
[52]
Miaojing Shi, Zhaohui Yang, Chao Xu, and Qijun Chen. 2018a. Perspective-Aware CNN For Crowd Counting. CoRR, Vol. abs/1807.01989 (2018).
[53]
Zenglin Shi, Le Zhang, Yun Liu, Xiaofeng Cao, Yangdong Ye, Ming-Ming Cheng, and Guoyan Zheng. 2018b. Crowd Counting With Deep Negative Correlation Learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 5382--5390.
[54]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[55]
Vishwanath A. Sindagi and Vishal M. Patel. 2017a. CNN-Based cascaded multi-task learning of high-level prior and density estimation for crowd counting. In Proceedings of International Conference on Advanced Video and Signal Based Surveillance. 1--6.
[56]
Vishwanath A. Sindagi and Vishal M. Patel. 2017b. Generating High-Quality Crowd Density Maps Using Contextual Pyramid CNNs. In Proceedings of International Conference on Computer Vision. 1879--1888.
[57]
Yukun Tian, Yimei Lei, Junping Zhang, and James Z. Wang. 2018. PaDNet: Pan-Density Crowd Counting. CoRR, Vol. abs/1811.02805 (2018).
[58]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE conference on computer vision and pattern recognition, Vol. 1. I--I.
[59]
Paul Viola, Michael J Jones, and Daniel Snow. 2005. Detecting pedestrians using patterns of motion and appearance. International Journal of Computer Vision, Vol. 63, 2 (2005), 153--161.
[60]
Elad Walach and Lior Wolf. 2016. Learning to Count with CNN Boosting. In Proceedings of European Conference on Computer Vision. 660--676.
[61]
Meng Wang and Xiaogang Wang. 2011. Automatic adaptation of a generic pedestrian detector to a specific traffic scene. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3401--3408.
[62]
Ze Wang, Zehao Xiao, Kai Xie, Qiang Qiu, Xiantong Zhen, and Xianbin Cao. 2018. In Defense of Single-column Networks for Crowd Counting. In Proceedings of British Machine Vision Conference. 78.
[63]
Xingjiao Wu, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, and Liang He. 2018. Adaptive Scenario Discovery for Crowd Counting. CoRR, Vol. abs/1812.02393 (2018).
[64]
Lingke Zeng, Xiangmin Xu, Bolun Cai, Suo Qiu, and Tong Zhang. 2017. Multi-scale convolutional neural networks for crowd counting. In Proceedings of International Conference on Image Processing. 465--469.
[65]
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. 2015. Cross-scene crowd counting via deep convolutional neural networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 833--841.
[66]
Lu Zhang, Miaojing Shi, and Qiaobo Chen. 2018a. Crowd Counting via Scale-Adaptive Convolutional Neural Network. In Proceedings of Winter Conference on Applications of Computer Vision. 1113--1121.
[67]
Le Zhang and Ponnuthurai Nagaratnam Suganthan. 2017. Benchmarking ensemble classifiers with novel co-trained kernel ridge regression and random vector functional link ensembles [research frontier]. IEEE Computational Intelligence Magazine, Vol. 12, 4 (2017), 61--72.
[68]
Youmei Zhang, Chunluan Zhou, Faliang Chang, and Alex C. Kot. 2018b. Attention to Head Locations for Crowd Counting. CoRR, Vol. abs/1806.10287 (2018).
[69]
Yingying Zhang, Desen Zhou, Siqin Chen, Shenghua Gao, and Yi Ma. 2016. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition . 589--597.
[70]
Tao Zhao, Ram Nevatia, and Bo Wu. 2008. Segmentation and tracking of multiple humans in crowded environments. IEEE transactions on pattern analysis and machine intelligence, Vol. 30, 7 (2008), 1198--1211.

Cited By

View all
  • (2025)A survey of deep learning methods for density estimation and crowd countingVicinagearth10.1007/s44336-024-00011-82:1Online publication date: 6-Feb-2025
  • (2025)CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid networkApplied Intelligence10.1007/s10489-025-06263-155:5Online publication date: 21-Jan-2025
  • (2025)MSFFNet: multi-scale feature fusion network with semantic optimization for crowd countingPattern Analysis and Applications10.1007/s10044-024-01385-728:1Online publication date: 9-Jan-2025
  • Show More Cited By

Index Terms

  1. Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MM '19: Proceedings of the 27th ACM International Conference on Multimedia
      October 2019
      2794 pages
      ISBN:9781450368896
      DOI:10.1145/3343031
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 15 October 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. crowd counting
      2. multi-column network
      3. mutual learning strategy

      Qualifiers

      • Research-article

      Funding Sources

      • Foundation for Department of Transportation of Henan Province
      • Sichuan Science and Technology Innovation Seedling Fund
      • Excellent Doctoral Dissertation of Southwest Jiaotong University
      • U.S. Department of Commerce
      • National Institute of Standards and Technology and by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/Interior Business Center (DOI/IBC)
      • China Scholarship Council
      • National Natural Science Foundation of China

      Conference

      MM '19
      Sponsor:

      Acceptance Rates

      MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;
      Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)199
      • Downloads (Last 6 weeks)18
      Reflects downloads up to 02 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A survey of deep learning methods for density estimation and crowd countingVicinagearth10.1007/s44336-024-00011-82:1Online publication date: 6-Feb-2025
      • (2025)CrowdFPN: crowd counting via scale-enhanced and location-aware feature pyramid networkApplied Intelligence10.1007/s10489-025-06263-155:5Online publication date: 21-Jan-2025
      • (2025)MSFFNet: multi-scale feature fusion network with semantic optimization for crowd countingPattern Analysis and Applications10.1007/s10044-024-01385-728:1Online publication date: 9-Jan-2025
      • (2025)Global vision, local focus: the semantic enhancement transformer network for crowd countingSoft Computing10.1007/s00500-025-10506-129:2(1035-1052)Online publication date: 7-Feb-2025
      • (2024)A Weakly Supervised Crowd Counting Method via Combining CNN and TransformerElectronics10.3390/electronics1324505313:24(5053)Online publication date: 23-Dec-2024
      • (2024)Domain-Agnostic Crowd Counting via Uncertainty-Guided Style Diversity AugmentationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681310(1642-1651)Online publication date: 28-Oct-2024
      • (2024)Context-Aware Gridding Non-local Block for Improving Object Detection in UAV ImagesProceedings of the 2024 16th International Conference on Machine Learning and Computing10.1145/3651671.3651745(257-263)Online publication date: 2-Feb-2024
      • (2024)Attention-injective scale aggregation network for crowd countingJournal of Electronic Imaging10.1117/1.JEI.33.5.05300833:05Online publication date: 1-Sep-2024
      • (2024)Training-free Object Counting with Prompts2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00039(322-330)Online publication date: 3-Jan-2024
      • (2024)Confusion Region Mining for Crowd CountingIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.331102035:12(18039-18051)Online publication date: Dec-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media