An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences

77 Accesses
Explore all metrics

Abstract

Human activity recognition (HAR) is complex in real time because of varying views, illuminations, backgrounds, and colors. With the current state of the art, deep learning (DL) algorithms are gaining more attention because of their automated feature extraction in contrast to the handcrafted machine learning (ML) methods. In this work, we aim to exploit a data fusion approach for HAR and propose an intermediate feature fusion approach for vision-based HAR employing convolutional neural networks (CNN2D) and transfer learning (TL) techniques with pretrained residual neural networks (ResNet50) for the extraction of local and global features, respectively. These extracted features are then fused employing a concatenation layer before classifying activities. We have focused on detecting two categories of activities: action (single person) and interactions (human–human and human-object). The proposed activity recognition is able to detect human activities in constrained as well as unconstrained environments with multiple viewpoints. The proposed work is evaluated with five benchmark vision datasets, namely, KTH, Weizmann, IXMAS, CASIA action database, and MSR Daily Activity 3D, in terms of accuracy and confusion matrix. This proposed framework is able to recognize complex activities with better accuracy than single-person-based activities seen in the MSR Daily Activity 3D and CASIA datasets, gaining the highest accuracy of 99.94% and 99.76%, respectively. The comparative analysis with the existing state-of-the-art methods shows the superiority of the performance of the proposed model in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 9

Fig. 10

A deep multimodal network based on bottleneck layer features fusion for action recognition

Article 20 August 2021

FusedNet: A Fusion of Time Series and Imaging Based Human Activity Recognition Using ResNet

Computer Vision with Deep Learning for Human Activity Recognition: Features Representation

Data Availability

The data associated with this work will be provided on a reasonable request.

References

Saleem G, Bajwa UI, Raza RH. Toward human activity recognition: a survey. Neural Comput Appl. 2023;35(5):4145–82.
Article Google Scholar
Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H. Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn. 2020;108: 107561.
Article Google Scholar
Zhou H, Zhao Y, Liu Y, Lu S, An X, Liu Q. Multi-sensor data fusion and CNN-LSTM model for human activity recognition system. Sensors. 2023;23(10):4750.
Article Google Scholar
Vishwakarma S, Agrawal A. A survey on activity recognition and behaviour understanding in video surveillance. Vis Comput. 2013;29:983–1009.
Article Google Scholar
Beddiar DR, Nini B, Sabokrou M, Hadid A. A Vision-based human activity recognition: a survey. Multimedia Tools Appl. 2020;79:30509–55.
Article Google Scholar
Subasi A, Khateeb K, Brahimi T, Sarirete A. Human activity recognition using machine methods in a healthcare environment. Innovation in health Informatics Academic Press. 2020; 123–144.
Girdhar P, Johri P, Virmani D. Vision based human activity recognition: a comprehensive review of method & techniques. Turkish J Comp Math Educ. 2021;12:7383–94.
Google Scholar
Ding R, Li X, Nei L, Li J, Si X, Chu D, Lui G, Zhan D. Empirical study and improvement on deep transfer learning for human activity recognition. Sensors. 2018;19:57.
Article Google Scholar
Adama DA, Lotfi A, Ranson R. A survey of vision-based transfer learning in human activity recognition. Electronics. 2021;10:2412.
Article Google Scholar
Islam M, Nooruddin S, Karray F, Muhammad G. Human activity recognition using tools of convolutional neural networks: a state of the art review data sets challenges and future prospects. Comp Biol Med. 2022;149:106060.
Article Google Scholar
Li Z, Liu F, Yang W, Peng S, Zhou J. A survey of convolutional neural networks: analysis applications and prospects. IEEE Trans Neural Netw Learn Syst. 2021;33(12):6999–7019.
Article MathSciNet Google Scholar
Xie J, Xin W, Liu R, Miao Q, Sheng L, Zhang L, Gao X. Global co-occurrence feature and local spatial feature learning for skeleton-based action recognition. Entropy. 2020;20:1135.
Article Google Scholar
Zhang Y, Yin Y, Wang Y, Ai J, Wu D. CSI-based location -independent human activity recognition with parallel convolutional networks. Comput Commun. 2023;197:87–95.
Article Google Scholar
Tuncer T, Ertam F, Dogan S, Aydemir E, Plawiak P. Ensemble residual network-based gender and activity method with signals. J Supercomput. 2020;76:2119–38.
Article Google Scholar
Boulahia SY, Amamra A, Madi MR, Daikh S. Early intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach Vis Appl. 2021;32:1–18.
Article Google Scholar
Gadzicki K, Khamsehashari R, Zetzsche C. Early vs late fusion in multimodal convolutional neural networks. In 2020 IEEE 23rd International Conference on Information Fusion (FUSION). 2020; 1–6.
Sun Z, Ke Q, Rahmani H, Bennamoun M, Wang G, Liu J. Human action recognition for various data modalities: A review. IEEE Trans Pattern Anal Mach Intell. 2022;45:3200–25.
Google Scholar
Aguileta A, Brena RF, Mayora O, Molino-Minero-Re E, Trejo LA. Multi-sensor fusion for activity recognition-a survey. Sensors. 2019;19:3803.
Article Google Scholar
Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, Abbasi AA. Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimedia Tools Appl. 2024;83:14885–911.
Article Google Scholar
Franco A, Maio MA, D,. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett. 2020;131:293–9.
Article Google Scholar
Zhang S, Wei Z, Nei J, Huang L, Wang S, Li Z. A review on human activity recognition using vision-based method. J Healthcare Eng. 2017;2017:1–31.
Google Scholar
Hussain Z, Sheng M, Zhang WE. Different approaches for human activity recognition: a survey. arXiv preprint arXiv: 190605074. 2019
Oh S, Ashiquzzaman A, Lee D, Kim Y, Kim J. Study on human activity recognition using semi-supervised active transfer learning. Sensors. 2021;21:2760.
Article Google Scholar
Al-Faris M, Chiverton J, Ndzi D, Ahmed A. A review on computer vision-based methods for human action recognition. Journal of Imaging. 2020;6:46.
Article Google Scholar
Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition an overview and real world challenges. Forensic Sci Int Digital Investig. 2020;32: 200901.
Article Google Scholar
Ray A, Kolekar MH, Balasubramanian R, Hafiane A. Transfer learning enhanced vision-based human activity recognition a decade-long analysis. Int J Inform Management Data Insights. 2023;3:100142.
Google Scholar
Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS. Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev. 2022;55(6):4755–808.
Article Google Scholar
Qui S, Zhao H, Jiang N, Wang Z, Lui L, An Y, Zhao H, Miao X, Lui R, Fortino G. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: state-of-the-art and research challenges. Inform Fusion. 2022;80:241–65.
Article Google Scholar
Shiranthika C, Premakumara N, Chui HL, Samani H, Shyalika C, Yang CY. Human activity recognition using CNN & LSTM. In 5th International Conference on Information Technology Research (ICITR). 2020; 1–6.
Nweke HF, The YW, Mujtaba G, Al-Garadi MA. Data fusion and multiple classifier systems for human activity detection and health monitoring: review and open research direction. Inform Fus. 2019;46:147–70.
Article Google Scholar
Uddin MA, Lee YK. Feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition. Sensors. 2019;19:1599.
Article Google Scholar
Naveed H, Khan G, Khan AU, Siddiqi A, Khan MUG. Human activity recognition using mixture of heterogeneous features and sequential minimal optimization. Int J Mach Learn Cybern. 2019;10:2329–40.
Article Google Scholar
Mahajan RC, Pathare NK, Vyas V. Video-based anomalous activity detection using 3D-CNN and transfer learning. In IEEE 7th International Conference for Convergence in Technology (I2CT). 2022; 1–6.
Zamri NM, Ling GF, Han PY, Yin OS. Vision-based human action recognition on pretrained AlexNet. In 9th IEEE International conference on Control System Computing and Engineering (ICCSCE). 2019; 1–5.
Vishwakarma DK, Dhiman C. A unified model for human activity recognition using spatial distribution of gradients and difference of gaussian kernel. Vis Comput. 2019;35:1595–613.
Article Google Scholar
Huan RH, Xie CJ, Guo F, Chi KK, Mao KJ, Li YL, Pan Y. Human action recognition based on HOIRM feature fusion and AP clustering BOW. PloSone. 2019;14: e0219910.
Article Google Scholar
Nida N, Yousaf MH, Irtaza A, Velastin SA. Video augmentation technique for human action recognition using genetic algorithm. ETRI J. 2022;44:327–38.
Article Google Scholar
Malik NUR, Abu-Baker SAR, Sheikh UU, Channa A, Popescu N. Cascading pose features with CNN-LSTM for multiview human action recognition. Signals. 2023;4:40–55.
Article Google Scholar
Abdelbaky A, Aly S. Two-stream spatiotemporal feature fusion for human action recognition. Vis Comput. 2021;37:1821–35.
Article Google Scholar
Gupta S, Vishwakarma DK, Puri NK. Leveraging human segmentation guided frames in videos for activity recognition. In 6th International Conference on Computing Methodologies and Communication (ICCMC). 2022; 1406–1411.
Verma KK, Singh BM. Deep multi-model fusion for human activity recognition using evolutionary algorithms. Int J Interactive Multimedia Artif Intell. 2021;7(2):44.
Google Scholar
Nigam S, Singh R, Singh MK, Singh VK. Multiview human activity recognition using uniform rotation invariant local binary patterns. J Ambient Intell Human Comput. 2022;14(5):4707–25.
Article Google Scholar
Vo VH, Pham HM. Multiple modal features and multiple kernel learning for human daily activity recognition. Sci Technol Develop J. 2018;21:52–63.
Article Google Scholar
Basly H, Quarda W, Sayadi FE, Ouni B, Alimi AM. DTR-HAR deep temporal residual representation for human activity recognition. The Visual Computer. 2022; 1–21.
Ahad MAR. Action datasets and MHI. Motion History Images for Action recognition and understanding. Singapore: Springer; 2013. p. 77–85.
Google Scholar
MSR DailyActivity3D. Dr Wanqing Li (UOW). Available: https://sites.google.com/view/wanqingli/data-sets/msr-dailyactivity3d.
Centre for Biometrics and Security Research. Available: http://www.cbsr.ia.ac.cn/english/Action%20Databases%20EN.asp.
Malik Z, Shapiai MIB. Human action interpretation using convolutional neural network: a survey. Mach Vis Appl. 2022;33:1–23.
Article Google Scholar
Islam MS, Okita T, Inoue S. Evaluation of transfer learning for human activity recognition among different datasets. In IEEE International Conference on Dependable Autonomic and Secure Computing International Conference on Pervasive intelligence and Computing Intl Conf on Cloud and Big Data Computing International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). 2019; 854–859.
Wen L, Li X, Gao L. A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Comput Appl. 2020;32:6114–5.
Article Google Scholar
Boesch G. Deep Residual Networks (ResNet ResNet50) - 2024 Guide. visoai. 2023. Available: https://viso.ai/deep-learning/resnet-residual-neural-network/.
Khelalef A, Ababsa F, Benoudjit N. An efficient human activity recognition technique based on deep learning. Pattern Recognit Image Anal. 2019;29:702–15.
Article Google Scholar
Feng W, Feng Y. Research on action recognition based on deep learning with long short-term memory network and attention mechanism. Wirel Commun Mobile Comput. 2022;2022:1–9.
Google Scholar
Kumar R, Sagar LK, Awasthi S. Human activity recognition from video clip. Intelligent Computing in Engineering. 2020; 269–274.
Jaouedi N, Boujnah N, Bouhlel MS. A new hybrid deep learning model for human action recognition. J King Saud Univ Comp Inform Sci. 2020;32:447–53.
Google Scholar
Basha SH, Pulabaigari V, Mukherjee S. An information-rich sampling technique over spatio-temporal CNN for classification of human action in videos. Multimedia Tools Appl. 2022;81:40431–49.
Article Google Scholar
Roselind Johnson D, Uthariaraj VR. A novel parameter initialization technique using RBM-NN for human action recognition. Comput Intell Neurosci. 2020;2020:1–30.
Article Google Scholar
Garg A, Nigam S, Singh R. Vision-based human activity recognition using hybrid deep learning In IEEE International Conference on Connected Systems and Intelligence (CSI). 2022; 1- 6.
Han PY, Yee KE, Yin OS. Localised representation in human action recognition. In Proceedings of the 2018 VII International Conferences on Network Communication and Computing. 2018; 261–266.
Khater S, Hadhoud M, Fayak MB. A novel human activity recognition architecture: using residual inception ConvLSTM layer. J Eng Appl Sci. 2022;69:1–16.
Article Google Scholar
Patel CI, Labana D, Pandya S, Modi K, Ghayvat H, Awais M. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors. 2020;20:7299.
Article Google Scholar
Snoun A, Jlidi N, Bouchrika T, Jemai O, Zaied M. Towards a deep human activity recognition approach based on video to image transformation with skeleton data. Multimedia Tools Appl. 2021;80:29675–98.
Article Google Scholar
D’ Sa G, Prasad BG. An IoT based framework for activity recognition using deep learning technique. arXiv preprint arXiv: 190607247. 2019.
Berlin SJ, John M. Particle swarm optimization with deep learning for human action recognition. Multimedia Tools Appl. 2020;79:17349–71.
Article Google Scholar
Abdelbaky A, Aly S. Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network. Multimedia Tools Appl. 2021;80:20019–43.
Article Google Scholar
Hua G, Hemantha Kumar G, Manjunath Aradhya VN. A hybrid speed and radial distance feature descriptor using optical flow approach in HAR. Applied Intelligence and Informatics: Second International Conference All 2022 Proceedings. Springer Nature Switzerland. 2023; 1-13.
Nida N, Yousaf MH, Irtaza A, Velastin SA. Instructor activity recognition through deep spatiotemporal features and feedforward extreme learning machine. Math Prob Eng. 2019. https://doi.org/10.1155/2019/2474865.
Article Google Scholar
Nadeem A, Jalal A, Kim K. Automatic human pose estimation for sport activity recognition with robust body part detection and entropy markov model. Multimedia Tools Appl. 2021;80:21465–98.
Article Google Scholar
Mishra O, Kavimandan PS, Tripathi MM, Kapoor R, Yadav K. Human action recognition using a new hybrid descriptor. In Advances in VLSI Communication and Signal Processing. 2021; 527–536.
Nida N, Yousaf MH, Irtaza A, Velastin SA. Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci. 2020;28:1371–85.
Article Google Scholar
Goyal G, Noceti N, Odone F. Single view learning in action recognition. In 25th International Conference on Pattern Recognition (ICPR). 2021; 3690–3697.
Marshella A, Goyal G, Odone F. Adversarial feature refinement for cross-view action recognition. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. 2021; 1046–1054.
Naeem HB, Murtaza F, Yousaf MH, Velastin SA. T-VALD: Temporal vector of locally aggregated descriptor for multiview human action recognition. Pattern Recogn Lett. 2021;148:22–8.
Article Google Scholar
Xu C, Wu X, Li Y, Jin Y, Wang M, Lui Y. Cross-modality online distillation for multi-view action recognition. Neurocomputing. 2021;456:384–93.
Article Google Scholar
Malik NUR, Sheikh UU, Abu-Baker SAR, Channa A. Multi-view human action recognition using skeleton based-FineKNN with extraneous frame scrapping technique. Sensors. 2023;23:2745.
Article Google Scholar
Zhang J, Bai F, Zhao J, Song Z. Multi-views action recognition on 3D ResNet-LSTM framework. In IEEE 2nd International Conference on Big Data Artificial Intelligence Internet of Things Engineering (ICBAIE). 2021; 289–293.
Nigam S, Singh R, Singh MK, Singh VK. Multiple views based recognition of human activities using uniform patterns. In 6th International Conference on Image Information Processing (ICIIP) 2021; 6: 483–488.
Basly H, Ouarda W, Sayadi FE, Ouni B, Alimi AM. CNN-SVM learning approach based human activity recognition. In International Conference on Image and Signal Processing. 2020; 271–281.
Debnath B, O’Brient M, Kumar S, Behera A. Attention-driven body pose encoding for human activity recognition. In 25th International Conference on Pattern Recognition (ICPR). 2021; 5897–5904.
Islam MS, Bakhat K, Khan R, Iqbal M, Islam MM, Ye Z. Action recognition using interrelationships of 3D joints and frames based on angle sine relation and distance features using interrelationships. Appl Intell. 2021;51:6001–13.
Article Google Scholar
Lui A, Xu N, Nie WZ, Su YT, Zhang YD. Multi-domain and multi-task learning for human action recognition. In IEEE Trans Image Process. 2018;28:853–67.
MathSciNet Google Scholar
Singh T, Vishwakarma DK. A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl. 2021;33:469–85.
Article Google Scholar
Khowaja SA, Lee SL. Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies. J Ambient Intell Human Comput. 2022;13(8):3729–46.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physical Science, Banasthali Vidyapith, Rajasthan, 304022, India
Aishvarya Garg
Department of Computer Science, Banasthali Vidyapith, Rajasthan, 304022, India
Swati Nigam & Rajiv Singh
Centre for Artificial Intelligence, Banasthali Vidyapith, Rajasthan, 304022, India
Aishvarya Garg, Swati Nigam & Rajiv Singh

Authors

Aishvarya Garg
View author publications
You can also search for this author in PubMed Google Scholar
Swati Nigam
View author publications
You can also search for this author in PubMed Google Scholar
Rajiv Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajiv Singh.

Ethics declarations

Conflict of Interest

The authors declare that there is no conflict of interest regarding this manuscript and received no funding for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Garg, A., Nigam, S. & Singh, R. An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences. SN COMPUT. SCI. 5, 1037 (2024). https://doi.org/10.1007/s42979-024-03345-8

Download citation

Received: 08 May 2024
Accepted: 25 September 2024
Published: 11 November 2024
DOI: https://doi.org/10.1007/s42979-024-03345-8

An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A deep multimodal network based on bottleneck layer features fusion for action recognition

FusedNet: A Fusion of Time Series and Imaging Based Human Activity Recognition Using ResNet

Computer Vision with Deep Learning for Human Activity Recognition: Features Representation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A deep multimodal network based on bottleneck layer features fusion for action recognition

FusedNet: A Fusion of Time Series and Imaging Based Human Activity Recognition Using ResNet

Computer Vision with Deep Learning for Human Activity Recognition: Features Representation

Data Availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now