Abstract
Estimating the number of people in Web images still remains a challenging problem owing to the perspective variation, different views, and diverse backgrounds. Existing deep learning models still have difficulties in dealing with scenarios where the size of a person is either extremely large or extremely small. In this paper, we propose a novel perspective-aware architecture to estimate the number of people in a crowd in web images. Specifically, we use a two-stage framework, where we first learn a policy network to infer the perspective of the target scene, which outputs a scale label for the subsequent perspective normalization. Next, given the aligned inputs, we further adjust the scale-specific counting network to regress the final count. Experiments on challenging datasets demonstrate our approach can deal with a large perspective variation and that we have achieved state-of-theart results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ali S, Shah M. A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2007
Shao J, Kang K, Change Loy C, Wang X. Deeply learned attributes for crowded scene understanding. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 4657–4666
Idrees H, Soomro K, Shah M. Detecting humans in dense crowds using locally-consistent scale prior and global occlusion reasoning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(10): 1986–1998
Lempitsky V, Zisserman A. Learning to count objects in images. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 1324–1332
Chan A B, Liang Z S J, Vasconcelos N. Privacy preserving crowd monitoring: counting people without people models or tracking. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2008
Idrees H, Saleemi I, Seibert C, Shah M. Multi-source multi-scale counting in extremely dense crowd images. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2547–2554
Ma Z, Chan A B. Crossing the line: crowd counting by integer programming with local features. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2539–2546
Loy C C, Gong S, Xiang T. From semisupervised to transfer counting of crowds. In: Proceedings of IEEE International Conference on Computer Vision. 2013, 2256–2263
Chen K, Gong S, Xiang T, Loy C C. Cumulative attribute space for age and crowd density estimation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2013, 2467–2474
Fiaschi L, Köthe U, Nair R, Hamprecht F A. Learning to count with regression forest and structured labels. In: Proceedings of the 21st IEEE International Conference on Pattern Recognition. 2012, 2685–2688
Chen K, Loy C C, Gong S, Xiang T. Feature mining for localised crowd counting. In: Proceedings of the British Machine Vision Conference. 2012
Shang C, Ai H, Bai B. End-to-end crowd counting via joint learning local and global count. In: Proceedings of the International Conference on Image Processing. 2016, 1215–1219
Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column convolutional neural network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2016, 589–597
Onoro-Rubio D, López-Sastre R J. Towards perspective-free object counting with deep learning. In: Proceedings of the European Conference on Computer Vision. 2016, 615–629
Zhang C, Li H, Wang X, Yang X. Cross-scene crowd counting via deep convolutional neural networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015, 833–841
Rabaud V, Belongie S. Counting crowded moving objects. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2006, 705–711
Wu X, Liang G, Lee K K, Xu Y. Crowd density estimation using texture analysis and learning. In: Proceedings of IEEE International Conference on Robotics and Biomimetics. 2006, 214–219
Kong D, Gray D, Tao H. A viewpoint invariant approach for crowd counting. In: Proceedings of the 18th IEEE International Conference on Pattern Recognition. 2006, 1187–1190
Cong Y, Gong H, Zhu S C, Tang Y. Flow mosaicking: real-time pedestrian counting without scene-specific learning. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2009, 1093–1100
Tang N C, Lin Y Y, Weng M F, Liao H Y M. Cross-camera knowledge transfer for multiview people counting. IEEE Transactions on image processing, 2015, 24(1): 80–93
Zhang Z, Wang M, Geng X. Crowd counting in public video surveillance by label distribution learning. Elsevier Neurocomputing, 2015, 166: 151–163
Liu B, Vasconcelos N. Bayesian model adaptation for crowd counts. In: Proceedings of IEEE International Conference on Computer Vision. 2015, 4175–4183
Arteta C, Lempitsky V, Noble J A, Zisserman A. Interactive object counting. In: Proceedings of the European Conference on Computer Vision. 2014, 504–518
Pham V Q, Kozakaya T, Yamaguchi O, Okada R. Count forest: covoting uncertain number of targets using random forest for crowd density estimation. In: Proceedings of the IEEE International Conference on Computer Vision. 2015, 3253–3261
Felzenszwalb P F, Huttenlocher D P. Efficient belief propagation for early vision. International Journal of Computer Vision, 2006, 70(1): 41–54
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2015
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2015, arXiv preprint arXiv:1512.03385
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014, arXiv preprint arXiv:1409.1556
Kingma D, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv:1412.6980
Rodriguez M, Sivic J, Laptev I, Audibert J Y. Data-driven crowd analysis in videos. In: Proceedings of IEEE International Conference on Computer Vision. 2011, 1235–1242
An S, Liu W, Venkatesh S. Face recognition using kernel ridge regression. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. 2007
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61521002).
Author information
Authors and Affiliations
Corresponding author
Additional information
Chong Shang received the BS degree in computer science and technology with the honor of the Outstanding Graduating Student from Northwestern Polytechnical University, China in 2013. He is currently pursuing his PhD degree at Tsinghua University, China. His research interests are computer vision and deep learning, with a current specific focus on object detection and crowd analysis.
Haizhou Ai received the BS, MS, and PhD degrees from Tsinghua University, China in 1985, 1988, and 1991, respectively. From 1994 to 1996, he was with the Flexible Production System Laboratory, University of Brussels, Belgium, as a Postdoctoral Researcher. He is currently a Professor with the Computer Science and Technology Department, Tsinghua University. His research domain is in the computer vision and pattern recognition field, particularly in object detection, tracking, and recognition. He has published more than 80 papers in refereed journals and conference proceedings. He supervised the Best PhD Dissertation of the Beijing Municipal City in computer science and technology in the year of 2008 and the Best Student Paper of IEEE CVPR 2007.
Yi Yang received the BS degree in network engineering and PhD degree in pattern recognition from Sichuan University, China and the Institute of Automation, Chinese Academy of Sciences, China in 2010 and 2016, respectively. Since 2016, she has been with 2012 labs, Huawei Technologies Co., Ltd., China, where she is currently an algorithm engineer. Her research interests include computer vision, pattern recognition, deep learning, and object detection.
Electronic supplementary material
Rights and permissions
About this article
Cite this article
Shang, C., Ai, H. & Yang, Y. Crowd counting via learning perspective for multi-scale multi-view Web images. Front. Comput. Sci. 13, 579–587 (2019). https://doi.org/10.1007/s11704-017-6598-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-017-6598-3