More Web Proxy on the site http://driver.im/

research-article

Deep Ensemble Learning for Human Action Recognition in Still Images

Authors:

Bin Li Academic Editor:

Sarangapani JagannathanAuthors Info & Claims

Complexity, Volume 2020

https://doi.org/10.1155/2020/9428612

Published: 01 January 2020 Publication History

Abstract

Numerous human actions such as “Phoning,” “PlayingGuitar,” and “RidingHorse” can be inferred by static cue-based approaches even if their motions in video are available considering one single still image may already sufficiently explain a particular action. In this research, we investigate human action recognition in still images and utilize deep ensemble learning to automatically decompose the body pose and perceive its background information. Firstly, we construct an end-to-end NCNN-based model by attaching the nonsequential convolutional neural network (NCNN) module to the top of the pretrained model. The nonsequential network topology of NCNN can separately learn the spatial- and channel-wise features with parallel branches, which helps improve the model performance. Subsequently, in order to further exploit the advantage of the nonsequential topology, we propose an end-to-end deep ensemble learning based on the weight optimization (DELWO) model. It contributes to fusing the deep information derived from multiple models automatically from the data. Finally, we design the deep ensemble learning based on voting strategy (DELVS) model to pool together multiple deep models with weighted coefficients to obtain a better prediction. More importantly, the model complexity can be reduced by lessening the number of trainable parameters, thereby effectively mitigating overfitting issues of the model in small datasets to some extent. We conduct experiments in Li’s action dataset, uncropped and 1.5x cropped Willow action datasets, and the results have validated the effectiveness and robustness of our proposed models in terms of mitigating overfitting issues in small datasets. Finally, we open source our code for the model in GitHub (https://github.com/yxchspring/deep_ensemble_learning) in order to share our model with the community.

References

[1]

V. Delaitre, I. Laptev, and J. Sivic, “Recognizing human actions in still images: a study of bag-of-features and part-based representations,” in Proceedings of the BMVC 2010-21st British Machine Vision Conference, Aberystwyth, UK, August 2010.

[2]

G. Yao, T. Lei, and J. Zhong, “A review of convolutional-neural-network-based action recognition,” Pattern Recognition Letters, vol. 118, pp. 14–22, 2019.

[3]

W. Xu, Z. Miao, J. Yu, and Q. Ji, “Action recognition and localization with spatial and temporal contexts,” Neurocomputing, vol. 333, pp. 351–363, 2019.

Digital Library

[4]

M. Majd and R. Safabakhsh, “Correlational convolutional LSTM for human action recognition,” Neurocomputing, 2019.

[5]

C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941, Las Vegas, NV, USA, July 2016.

[6]

H. Bilen, B. Fernando, E. Gavves, and A. Vedaldi, “Action recognition with dynamic image networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2799–2813, 2018.

Digital Library

[7]

P. Li, J. Ma, and S. Gao, “Actions in still web images: visualization, detection and retrieval,” in Proceedings of the International Conference on Web-Age Information Management, pp. 302–313, Wuhan, China, September 2011.

[8]

S. Ji, W. Xu, M. Yang, and K. Yu, “3D convolutional neural networks for human action recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 1, pp. 221–231, 2013.

Digital Library

[9]

Y. Zhang, L. Cheng, J. Wu, J. Cai, M. N. Do, and J. Lu, “Action recognition in still images with minimum annotation efforts,” IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5479–5490, 2016.

[10]

F. S. Khan, R. Muhammad Anwer, J. Van De Weijer, A. D. Bagdanov, A. M. Lopez, and M. Felsberg, “Coloring action recognition in still images,” International Journal of Computer Vision, vol. 105, no. 3, pp. 205–221, 2013.

Digital Library

[11]

G. Guo and A. Lai, “A survey on still image based human action recognition,” Pattern Recognition, vol. 47, no. 10, pp. 3343–3361, 2014.

[12]

S. Abidi, M. Piccardi, and M. A. Williams, “Action recognition in still images by latent superpixel classification,” 2015, https://arxiv.org/abs/1507.08363.

[13]

Z. Liang, X. Wang, R. Huang, and L. Lin, “An expressive deep model for human action parsing from a single image,” in Proceedings of the 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE, Chengdu, China, July 2014.

[14]

T. Qi, Y. Xu, Y. Quan, Y. Wang, and H. Ling, “Image-based action recognition using hint-enhanced deep neural networks,” Neurocomputing, vol. 267, pp. 475–488, 2017.

[15]

J. Kong, B. Zan, and M. Jiang, “Human action recognition using depth motion maps pyramid and discriminative collaborative representation classifier,” Journal of Electronic Imaging, vol. 27, no. 3, 2018.

[16]

A. Gupta, A. Kembhavi, and L. S. Davis, “Observing human-object interactions: using spatial and functional compatibility for recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 10, pp. 1775–1789, 2009.

Digital Library

[17]

G. Zhang, S. Jia, X. Zhang, and X. Li, “Saliency-based foreground trajectory extraction using multiscale hybrid masks for action recognition,” Journal of Electronic Imaging, vol. 27, no. 5, 2018.

[18]

P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 9, pp. 1627–1645, 2010.

Digital Library

[19]

B. Ko, J. Hong, and J.-Y. Nam, “Human action recognition in still images using action poselets and a two-layer classification model,” Journal of Visual Languages & Computing, vol. 28, pp. 163–175, 2015.

Digital Library

[20]

L. Cai, X. Liu, F. Chen, and M. Xiang, “Robust human action recognition based on depth motion maps and improved convolutional neural network,” Journal of Electronic Imaging, vol. 27, no. 5, 2018.

[21]

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 2014, https://arxiv.org/abs/1409.1556.

[22]

Y. LeCun, B. E. Boser, J. S. Denker et al., “Handwritten digit recognition with a back-propagation network,” Advances in Neural Information Processing Systems, vol. 2, pp. 396–404, 1990.

Digital Library

[23]

C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, Boston, MA, USA, June 2015.

[24]

M. Lin, Q. Chen, and S. Yan, “Network in network,” 2013, https://arxiv.org/abs/1312.4400.

[25]

A. M. Nickfarjam and H. Ebrahimpour-Komleh, “Shape-based human action recognition using multi-input topology of deep belief networks,” in Proceedings of the 2017 9th International Conference on Information and Knowledge Technology (IKT), pp. 1–4, IEEE, Tehran, Iran, October 2017.

[26]

O. Oktay, W. Bai, M. Lee et al., “Multi-input cardiac image super-resolution using convolutional neural networks,” in Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 246–254, Athens, Greece, October 2016.

[27]

Y. Sun, L. Zhu, G. Wang, and F. Zhao, “Multi-input convolutional neural network for flower grading,” Journal of Electrical and Computer Engineering, vol. 2017, 8 pages, 2017.

Digital Library

[28]

Y. Fujita, R. Takashima, T. Homma, and M. Togami, “Data augmentation using multi-input multi-output source separation for deep neural network based acoustic modeling,” in Proceedings of the Interspeech, pp. 3818–3822, San Francisco, CA, USA, September 2016.

[29]

B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, and M. Woźniak, “Ensemble learning for data stream analysis: a survey,” Information Fusion, vol. 37, pp. 132–156, 2017.

Digital Library

[30]

X.-L. Zhang and D. Wang, “A deep ensemble learning method for monaural speech separation,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 5, pp. 967–977, 2016.

Digital Library

[31]

F. Chollet and J. J. Allaire, “Advanced deep-learning best practices,” in Deep learning with R, vol. 218–249, Manning Publications Co., Shelter Island, NY, USA, 2018.

Digital Library

[32]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, June 2016.

[33]

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708, Honolulu, HI, USA, July 2017.

[34]

A. G. Howard, M. Zhu, B. Chen et al., “Mobilenets: efficient convolutional neural networks for mobile vision applications,” 2017, https://arxiv.org/abs/1704.04861.

[35]

H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (SURF),” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346–359, 2008.

Digital Library

[36]

J. Sivic and A. Zisserman, “Video google: a text retrieval approach to object matching in videos,” in Proceedings Ninth IEEE International Conference on Computer Vision, p. 1470, IEEE, October 2003.

[37]

S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: spatial pyramid matching for recognizing natural scene categories,” in Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Volume 2 (CVPR’06), pp. 2169–2178, IEEE, Washington, DC, USA, 2006.

Digital Library

[38]

R. Ye and P. N. Suganthan, “Empirical comparison of bagging-based ensemble classifiers,” in Proceedings of the 2012 15th International Conference on Information Fusion, pp. 917–924, IEEE, Singapore, July 2012.

[39]

L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

Digital Library

[40]

H. Drucker, C. Cortes, L. D. Jackel, Y. LeCun, and V. Vapnik, “Boosting and other ensemble methods,” Neural Computation, vol. 6, no. 6, pp. 1289–1301, 1994.

Digital Library

[41]

A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Frontiers in Neurorobotics, vol. 7, p. 21, 2013.

[42]

R. Polikar, “Ensemble learning,” in Ensemble Machine Learning, pp. 1–34, Springer, Boston, MA, USA, 2012.

[43]

A. Oliva and A. Torralba, “Modeling the shape of the scene: a holistic representation of the spatial envelope,” International Journal of Computer Vision, vol. 42, no. 3, pp. 145–175, 2001.

Digital Library

[44]

S. Shah, K. Khatri, P. Mhasakar, R. Nagar, and S. Raman, “Unsupervised GIST based clustering for object localization,” in Proceedings of the 2019 National Conference on Communications (NCC), pp. 1–6, IEEE, Atlanta, GA, USA, February 2019.

[45]

R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626, Venice, Italy, October 2017.

Cited By

Li YWang SDong SLv XLv CFan D(2021)Person Reidentification Model Based on Multiattention Modules and Multiscale ResidualsComplexity10.1155/2021/66734612021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6673461
Chakraborty SMondal RSingh PSarkar RBhattacharjee D(2021)Transfer learning with fine tuning for human action recognition from still imagesMultimedia Tools and Applications10.1007/s11042-021-10753-y80:13(20547-20578)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1007/s11042-021-10753-y
Uddin MZada NAziz FSaeed YZeb AAli Shah SAl-Khasawneh MMahmoud M(2020)Prediction of Future Terrorist Activities Using Deep Neural NetworksComplexity10.1155/2020/13730872020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/1373087
Show More Cited By

Index Terms

Deep Ensemble Learning for Human Action Recognition in Still Images
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Sequential deep learning for human action recognition
HBU'11: Proceedings of the Second international conference on Human Behavior Unterstanding

We propose in this paper a fully automated deep model, which learns to classify human actions without using any prior knowledge. The first step of our scheme, based on the extension of Convolutional Neural Networks to 3D, automatically learns spatio-...
Human Action Recognition From Digital Videos Based on Deep Learning
ICCCV '22: Proceedings of the 5th International Conference on Control and Computer Vision

With the development of closed-circuit television, video-based human motion recognition has made great progress. A large number of surveillance video footages have been archived. In this paper, we implement deep learning methods to resolve human action ...
Optimized deep learning vision system for human action recognition from drone images
Abstract
There are several benefits to constructing a lightweight vision system that is implemented directly on limited hardware devices. Most deep learning-based computer vision systems, such as YOLO (You Only Look Once), use computationally expensive ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Complexity

Complexity Volume 2020, Issue

2020

17147 pages

ISSN:1076-2787

Issue’s Table of Contents

Copyright © 2020 Xiangchun Yu et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 01 January 2020

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YWang SDong SLv XLv CFan D(2021)Person Reidentification Model Based on Multiattention Modules and Multiscale ResidualsComplexity10.1155/2021/66734612021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6673461
Chakraborty SMondal RSingh PSarkar RBhattacharjee D(2021)Transfer learning with fine tuning for human action recognition from still imagesMultimedia Tools and Applications10.1007/s11042-021-10753-y80:13(20547-20578)Online publication date: 1-May-2021
https://dl.acm.org/doi/10.1007/s11042-021-10753-y
Uddin MZada NAziz FSaeed YZeb AAli Shah SAl-Khasawneh MMahmoud M(2020)Prediction of Future Terrorist Activities Using Deep Neural NetworksComplexity10.1155/2020/13730872020Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1155/2020/1373087
Zhu XNi ZXia PNi L(2020)Hybrid Ensemble Pruning Using Coevolution Binary Glowworm Swarm Optimization and Reduce-ErrorComplexity10.1155/2020/13296922020Online publication date: 31-Oct-2020
https://dl.acm.org/doi/10.1155/2020/1329692

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents