An ensemble approach for still image-based human action recognition

642 Accesses
1 Altmetric
Explore all metrics

Abstract

Still-image based human action recognition is a challenging task in the field of computer vision due to the limited information available in a single image. Hence, it is important to efficiently extract visual cues and structural information from the image in the process of classification. To this end, in this work, we utilize the Convolutional neural network for classification, based on the DenseNet 201 architecture. To focus upon informative regions of interest, the spatial attention module has been trained as a feature extractor to emphasize features from selective parts of the input image. We further leverage an effective ensemble approach based upon fuzzy fusion through the Choquet integral, which harnesses the complementary uncertainty of decision scores. This allows for a robust decision-making process on the fly, based upon coalitions of the inputs. Experimental results upon three challenging datasets: PPMI and Stanford 40, known for their confusing action classes, and BU-101, known for its immense scale, support the efficacy of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images

Article Open access 30 January 2022

Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

Article 20 July 2023

Ensemble of Classifiers Using CNN and Hand-Crafted Features for Depth-Based Action Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and material

Not applicable.

Code availability

Not applicable.

References

Ashrafi SS, Shokouhi SB et al (2020) Knowledge distillation framework for action recognition in still images. In: 2020 10th international conference on computer and knowledge engineering (ICCKE), IEEE, pp 274–277
Banerjee A, Singh PK, Sarkar R (2020) Fuzzy integral based cnn classifier fusion for 3d skeleton action recognition. IEEE transactions on circuits and systems for video technology
Barbosa FGO, Stemmer MR (2019) Action recognition in still images based on r-fcn detector. 14 Simpósio Brasileiro De Automação Inteligente https://doi.org/10.17648/sbai-2019-111140
Basak H, Kundu R, Singh PK, Ijaz MF, Woźniak M, Sarkar R (2022) A union of deep learning and swarm-based optimization for 3d human action recognition. Sci Rep 12(1):1–17
Article Google Scholar
Bhowal P, Sen S, Yoon JH, Geem ZW, Sarkar R (2021) Choquet integral and coalition game-based ensemble of deep learning models for covid-19 screening from chest x-ray images. IEEE J Biomed Health Inform 25(12):4328–4339
Article Google Scholar
Bhowal P, Sen S, Velasquez JD, Sarkar R (2022) Fuzzy ensemble of deep learning models using choquet fuzzy integral, coalition game and information theory for breast cancer histology classification. Expert Syst Appl 190:116167
Article Google Scholar
Bouadjenek N, Nemmour H, Chibani Y (2016) Fuzzy integral for combining svm-based handwritten soft-biometrics prediction. In: 2016 12th IAPR workshop on document analysis systems (DAS), IEEE, pp 311–316
Chakraborty S, Mondal R, Singh PK, Sarkar R, Bhattacharjee D (2021) Transfer learning with fine tuning for human action recognition from still images. Multimedia Tools and Applications pp 1–32
Chen L, Zhang H, Xiao J, Nie L, Shao J, Liu W, Chua TS (2017) Sca-cnn: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5659–5667
Corbetta M, Shulman GL (2002) Control of goal-directed and stimulus-driven attention in the brain. Nat Rev Neurosci 3(3):201–215
Article Google Scholar
Das A, Sil P, Singh PK, Bhateja V, Sarkar R (2020) Mmhar-ensemnet: a multi-modal human activity recognition model. IEEE Sens J 21(10):11569–11576
Article Google Scholar
Dehkordi HA, Nezhad AS, Ashrafi SS, Shokouhi SB (2021) Still image action recognition using ensemble learning. In: 2021 7th international conference on web research (ICWR), IEEE, pp 125–129
Dey S, Bhattacharya R, Malakar S, Mirjalili S, Sarkar R (2021) Choquet fuzzy integral-based classifier ensemble technique for covid-19 detection. Comput Biol Med 135:104585
Article Google Scholar
Gao R, Xiong B, Grauman K (2018) Im2flow: motion hallucination from static images for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5937–5947
Gkioxari G, Hariharan B, Girshick R, Malik J (2014) R-cnns for pose estimation and action detection. arXiv preprint arXiv:1406.5212
Gkioxari G, Girshick R, Malik J (2015a) Actions and attributes from wholes and parts. In: Proceedings of the IEEE international conference on computer vision, pp 2470–2478
Gkioxari G, Girshick R, Malik J (2015b) Contextual action recognition with r* cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1080–1088
Guo G, Lai A (2014) A survey on still image based human action recognition. Pattern Recogn 47(10):3343–3361
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Imran J (2022) Raman B (2021) Three-stream spatio-temporal attention network for first-person action and interaction recognition. J Ambient Intell Human Comput. 13:1137–1152. https://doi.org/10.1007/s12652-021-02940-4
Article Google Scholar
Jang Y, Lee H, Hwang SJ, Shin J (2019) Learning what and where to transfer. In: International conference on machine learning, PMLR, pp 3030–3039
Khan FS, Van De Weijer J, Anwer RM, Felsberg M, Gatta C (2014) Semantic pyramids for gender and action recognition. IEEE Trans Image Process 23(8):3633–3645
Article MathSciNet Google Scholar
Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M (2017) Ensemble learning for data stream analysis: a survey. Inform Fusion 37:132–156
Article Google Scholar
Kwak KC, Pedrycz W (2005) Face recognition: a study in information fusion using fuzzy integral. Pattern Recog Lett. 26
Lavinia Y, Vo HH, Verma A (2016) Fusion based deep cnn for improved large-scale image action recognition. In: 2016 IEEE international symposium on multimedia (ISM), IEEE, pp 609–614
Lavinia Y, Vo H, Verma A (2020) New colour fusion deep learning model for large-scale action recognition. Int J Comput Vis Robot 10(1):41–60
Article Google Scholar
Li Z, Ge Y, Feng J, Qin X, Yu J, Yu H (2020) Deep selective feature learning for action recognition. In: 2020 IEEE international conference on multimedia and expo (ICME), IEEE, pp 1–6
Lin CJ, Lin CH, Wang SH, Wu CH (2019) Multiple convolutional neural networks fusion using improved fuzzy integral for facial emotion recognition. Appl Sci 9(13):2593
Article Google Scholar
Liu J, Wang G, Hu P, Duan LY, Kot AC (2017) Global context-aware attention lstm networks for 3d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1647–1656
Liu X, Ma L, Mathew J (2009) Machinery fault diagnosis based on fuzzy measure and fuzzy integral data fusion techniques. Mech Syst Signal Process 23(3):690–700
Article Google Scholar
Ma S, Bargal SA, Zhang J, Sigal L, Sclaroff S (2017) Do less and achieve more: training cnns for action recognition utilizing action images from the web. Pattern Recogn 68:334–345
Article Google Scholar
Mohmed G, Lotfi A, Pourabdollah A (2020) Enhanced fuzzy finite state machine for human activity modelling and recognition. J Ambient Intell Humaniz Comput 11(12):6077–6091
Article Google Scholar
Mondal R, Mukherjee D, Singh PK, Bhateja V, Sarkar R (2020) A new framework for smartphone sensor-based human activity recognition using graph neural network. IEEE Sens J 21(10):11461–11468
Article Google Scholar
Murofushi T, Sugeno M (1989) An interpretation of fuzzy measures and the choquet integral as an integral with respect to a fuzzy measure. Fuzzy Sets Syst 29(2):201–227
Article MathSciNet Google Scholar
Oneata D, Verbeek J, Schmid C (2013) Action and event recognition with fisher vectors on a compact feature set. In: Proceedings of the IEEE international conference on computer vision, pp 1817–1824
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1717–1724
Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125
Article Google Scholar
Poonkodi M, Vadivu G (2021) Action recognition using correlation of temporal difference frame (ctdf)–an algorithmic approach. J Ambient Intell Human Comput. 12:7107–7120. https://doi.org/10.1007/s12652-020-02378-0
Article Google Scholar
Prest A, Schmid C, Ferrari V (2011) Weakly supervised learning of interactions between humans and objects. IEEE Trans Pattern Anal Mach Intell 34(3):601–614
Article Google Scholar
Qi T, Xu Y, Quan Y, Wang Y, Ling H (2017) Image-based action recognition using hint-enhanced deep neural networks. Neurocomputing 267:475–488
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252
Article MathSciNet Google Scholar
Safaei M (2020a) Action recognition in still images: confluence of multilinear methods and deep learning. In: Learning latent space-time representation using an ensemble method
Safaei M (2020b) Action recognition in still images: confluence of multilinear methods and deep learning. In: A zero-shot architecture for action recognition in still images
Safaei M, Foroosh H (2019) Still image action recognition by predicting spatial-temporal pixel evolution. In: 2019 IEEE winter conference on applications of computer vision (WACV), IEEE, pp 111–120
Safaei M, Balouchian P, Foroosh H (2018) Ticnn: a hierarchical deep learning framework for still image action recognition using temporal image prediction. In: 2018 25th IEEE international conference on image processing (ICIP), IEEE, pp 3463–3467
Safaei M, Balouchian P, Foroosh H (2020) Ucf-star: a large scale still image dataset for understanding human actions. Proc AAAI Conf Artif Intell 34:2677–2684
Google Scholar
Sarkar A, Banerjee A, Singh PK, Sarkar R (2022) 3d human action recognition: through the eyes of researchers. Expert Syst Appl. 193(C):116424. https://doi.org/10.1016/j.eswa.2021.116424
Article Google Scholar
Sarkar SS, Ansari MS, Mahanty A, Mali K, Sarkar R (2021) Microstructure image classification: a classifier combination approach using fuzzy integral measure. Integr Mater Manuf Innov. 10:286–298. https://doi.org/10.1007/s40192-021-00210-x
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 3506–3513
Singh PK, Sarkar R, Nasipuri M (2015) Statistical validation of multiple classifiers over multiple datasets in the field of pattern recognition. Int J Appl Pattern Recognit 2(1):1–23
Article Google Scholar
Singh PK, Sarkar R, Nasipuri M (2016) Significance of non-parametric statistical tests for comparison of classifiers over multiple datasets. Int J Comput Sci Math 7(5):410–442
Article MathSciNet Google Scholar
Singh PK, Kundu S, Adhikary T, Sarkar R, Bhattacharjee D (2021) Progress of human action recognition research in the last ten years: a comprehensive survey. Archiv Comput Methods Eng. 1–41. https://doi.org/10.1007/s11831-021-09681-9
Sugeno M (1993) Fuzzy measures and fuzzy integrals’a survey. In: Readings in fuzzy sets for intelligent systems, Elsevier, pp 251–257
Tahani H, Keller JM (1990) Information fusion in computer vision using the fuzzy integral. IEEE Trans Syst Man Cybern 20(3):733–741
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Wu SL, Liu YT, Hsieh TY, Lin YY, Chen CY, Chuang CH, Lin CT (2016) Fuzzy integral with particle swarm optimization for a motor-imagery-based brain-computer interface. IEEE Trans Fuzzy Syst 25(1):21–28
Article Google Scholar
Wu W, Yu J (2020) A part fusion model for action recognition in still images. In: International conference on neural information processing, Springer, pp 101–112
Xu H, Saenko K (2016) Ask, attend and answer: exploring question-guided spatial attention for visual question answering. In: European conference on computer vision, Springer, pp 451–466
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In: International conference on machine learning, PMLR, pp 2048–2057
Yan S, Smith JS, Lu W, Zhang B (2018) Multibranch attention networks for action recognition in still images. IEEE Trans Cognit Dev Syst 10(4):1116–1125. https://doi.org/10.1109/TCDS.2017.2783944
Article Google Scholar
Yang Z, Li Y, Yang J, Luo J (2018) Action recognition with spatio-temporal visual attention on skeleton image sequences. IEEE Trans Circuits Syst Video Technol 29(8):2405–2415
Article Google Scholar
Yao B, Fei-Fei L (2010) Grouplet: a structured image representation for recognizing human and object interactions. In: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, pp 9–16
Yao B, Jiang X, Khosla A, Lin AL, Guibas L, Fei-Fei L (2011) Human action recognition by learning bases of action attributes and parts. In: 2011 International conference on computer vision, IEEE, pp 1331–1338
Yu X, Zhang Z, Wu L, Pang W, Chen H, Yu Z, Li B (2020) Deep ensemble learning for human action recognition in still images. Complexity 2020:23
Google Scholar
Zhang J, Han Y, Jiang J (2016) Tucker decomposition-based tensor learning for human action recognition. Multimedia Syst 22(3):343–353
Article Google Scholar
Zhang L, Li C, Peng P, Xiang X, Song J (2016) Towards optimal vlad for human action recognition from still images. Image Vis Comput 55:53–63
Article Google Scholar
Zhang XL, Wang D (2016) A deep ensemble learning method for monaural speech separation. IEEE/ACM Trans Audio, Speech, and Lang Process 24(5):967–977
Article MathSciNet Google Scholar
Zhao Z, Ma H, Chen X (2017) Generalized symmetric pair model for action classification in still images. Pattern Recogn 64:347–360
Article Google Scholar
Zhao Z, Ma H, You S (2017) Single image action recognition using semantic body part actions. In: Proceedings of the IEEE international conference on computer vision, pp 3391–3399

Download references

Acknowledgements

The authors would like to thank the Center for Microprocessor Applications for Training Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing us with the infrastructural support.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Information Technology, Jadavpur University, Jadavpur University Second Campus, Plot No. 8, Salt Lake Bypass, LB Block, Sector III, Salt Lake City, Kolkata, West Bengal, 700106, India
Avinandan Banerjee, Sayantan Roy & Pawan Kumar Singh
Department of Electrical Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata, West Bengal, 700032, India
Rohit Kundu
Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial College of Engineering and Management, Faizabad Road, Lucknow, 226028, India
Vikrant Bhateja
Dr. A.P.J. Abdul Kalam Technical University, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Department of Computer Science and Engineering, Jadavpur University, 188, Raja S.C. Mallick Road, Kolkata, West Bengal, 700032, India
Ram Sarkar

Authors

Avinandan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Sayantan Roy
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Pawan Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Vikrant Bhateja
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ram Sarkar.

Ethics declarations

Conflict of interest

The authors declare there is no conflict of interest.

Ethics approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, A., Roy, S., Kundu, R. et al. An ensemble approach for still image-based human action recognition. Neural Comput & Applic 34, 19269–19282 (2022). https://doi.org/10.1007/s00521-022-07514-9

Download citation

Received: 12 July 2021
Accepted: 07 June 2022
Published: 05 July 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s00521-022-07514-9

An ensemble approach for still image-based human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images

Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

Ensemble of Classifiers Using CNN and Hand-Crafted Features for Depth-Based Action Recognition

Availability of data and material

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An ensemble approach for still image-based human action recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An evolving ensemble model of multi-stream convolutional neural networks for human action recognition in still images

Swin-Fusion: Swin-Transformer with Feature Fusion for Human Action Recognition

Ensemble of Classifiers Using CNN and Hand-Crafted Features for Depth-Based Action Recognition

Explore related subjects

Availability of data and material

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation