[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Human activity recognition (HAR) is complex in real time because of varying views, illuminations, backgrounds, and colors. With the current state of the art, deep learning (DL) algorithms are gaining more attention because of their automated feature extraction in contrast to the handcrafted machine learning (ML) methods. In this work, we aim to exploit a data fusion approach for HAR and propose an intermediate feature fusion approach for vision-based HAR employing convolutional neural networks (CNN2D) and transfer learning (TL) techniques with pretrained residual neural networks (ResNet50) for the extraction of local and global features, respectively. These extracted features are then fused employing a concatenation layer before classifying activities. We have focused on detecting two categories of activities: action (single person) and interactions (human–human and human-object). The proposed activity recognition is able to detect human activities in constrained as well as unconstrained environments with multiple viewpoints. The proposed work is evaluated with five benchmark vision datasets, namely, KTH, Weizmann, IXMAS, CASIA action database, and MSR Daily Activity 3D, in terms of accuracy and confusion matrix. This proposed framework is able to recognize complex activities with better accuracy than single-person-based activities seen in the MSR Daily Activity 3D and CASIA datasets, gaining the highest accuracy of 99.94% and 99.76%, respectively. The comparative analysis with the existing state-of-the-art methods shows the superiority of the performance of the proposed model in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

Data Availability

The data associated with this work will be provided on a reasonable request.

References

  1. Saleem G, Bajwa UI, Raza RH. Toward human activity recognition: a survey. Neural Comput Appl. 2023;35(5):4145–82.

    Article  Google Scholar 

  2. Dang LM, Min K, Wang H, Piran MJ, Lee CH, Moon H. Sensor-based and vision-based human activity recognition: a comprehensive survey. Pattern Recogn. 2020;108: 107561.

    Article  Google Scholar 

  3. Zhou H, Zhao Y, Liu Y, Lu S, An X, Liu Q. Multi-sensor data fusion and CNN-LSTM model for human activity recognition system. Sensors. 2023;23(10):4750.

    Article  Google Scholar 

  4. Vishwakarma S, Agrawal A. A survey on activity recognition and behaviour understanding in video surveillance. Vis Comput. 2013;29:983–1009.

    Article  Google Scholar 

  5. Beddiar DR, Nini B, Sabokrou M, Hadid A. A Vision-based human activity recognition: a survey. Multimedia Tools Appl. 2020;79:30509–55.

    Article  Google Scholar 

  6. Subasi A, Khateeb K, Brahimi T, Sarirete A. Human activity recognition using machine methods in a healthcare environment. Innovation in health Informatics Academic Press. 2020; 123–144.

  7. Girdhar P, Johri P, Virmani D. Vision based human activity recognition: a comprehensive review of method & techniques. Turkish J Comp Math Educ. 2021;12:7383–94.

    Google Scholar 

  8. Ding R, Li X, Nei L, Li J, Si X, Chu D, Lui G, Zhan D. Empirical study and improvement on deep transfer learning for human activity recognition. Sensors. 2018;19:57.

    Article  Google Scholar 

  9. Adama DA, Lotfi A, Ranson R. A survey of vision-based transfer learning in human activity recognition. Electronics. 2021;10:2412.

    Article  Google Scholar 

  10. Islam M, Nooruddin S, Karray F, Muhammad G. Human activity recognition using tools of convolutional neural networks: a state of the art review data sets challenges and future prospects. Comp Biol Med. 2022;149:106060.

    Article  Google Scholar 

  11. Li Z, Liu F, Yang W, Peng S, Zhou J. A survey of convolutional neural networks: analysis applications and prospects. IEEE Trans Neural Netw Learn Syst. 2021;33(12):6999–7019.

    Article  MathSciNet  Google Scholar 

  12. Xie J, Xin W, Liu R, Miao Q, Sheng L, Zhang L, Gao X. Global co-occurrence feature and local spatial feature learning for skeleton-based action recognition. Entropy. 2020;20:1135.

    Article  Google Scholar 

  13. Zhang Y, Yin Y, Wang Y, Ai J, Wu D. CSI-based location -independent human activity recognition with parallel convolutional networks. Comput Commun. 2023;197:87–95.

    Article  Google Scholar 

  14. Tuncer T, Ertam F, Dogan S, Aydemir E, Plawiak P. Ensemble residual network-based gender and activity method with signals. J Supercomput. 2020;76:2119–38.

    Article  Google Scholar 

  15. Boulahia SY, Amamra A, Madi MR, Daikh S. Early intermediate and late fusion strategies for robust deep learning-based multimodal action recognition. Mach Vis Appl. 2021;32:1–18.

    Article  Google Scholar 

  16. Gadzicki K, Khamsehashari R, Zetzsche C. Early vs late fusion in multimodal convolutional neural networks. In 2020 IEEE 23rd International Conference on Information Fusion (FUSION). 2020; 1–6.

  17. Sun Z, Ke Q, Rahmani H, Bennamoun M, Wang G, Liu J. Human action recognition for various data modalities: A review. IEEE Trans Pattern Anal Mach Intell. 2022;45:3200–25.

    Google Scholar 

  18. Aguileta A, Brena RF, Mayora O, Molino-Minero-Re E, Trejo LA. Multi-sensor fusion for activity recognition-a survey. Sensors. 2019;19:3803.

    Article  Google Scholar 

  19. Khan MA, Javed K, Khan SA, Saba T, Habib U, Khan JA, Abbasi AA. Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimedia Tools Appl. 2024;83:14885–911.

    Article  Google Scholar 

  20. Franco A, Maio MA, D,. A multimodal approach for human activity recognition based on skeleton and RGB data. Pattern Recogn Lett. 2020;131:293–9.

    Article  Google Scholar 

  21. Zhang S, Wei Z, Nei J, Huang L, Wang S, Li Z. A review on human activity recognition using vision-based method. J Healthcare Eng. 2017;2017:1–31.

    Google Scholar 

  22. Hussain Z, Sheng M, Zhang WE. Different approaches for human activity recognition: a survey. arXiv preprint arXiv: 190605074. 2019

  23. Oh S, Ashiquzzaman A, Lee D, Kim Y, Kim J. Study on human activity recognition using semi-supervised active transfer learning. Sensors. 2021;21:2760.

    Article  Google Scholar 

  24. Al-Faris M, Chiverton J, Ndzi D, Ahmed A. A review on computer vision-based methods for human action recognition. Journal of Imaging. 2020;6:46.

    Article  Google Scholar 

  25. Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition an overview and real world challenges. Forensic Sci Int Digital Investig. 2020;32: 200901.

    Article  Google Scholar 

  26. Ray A, Kolekar MH, Balasubramanian R, Hafiane A. Transfer learning enhanced vision-based human activity recognition a decade-long analysis. Int J Inform Management Data Insights. 2023;3:100142.

    Google Scholar 

  27. Gupta N, Gupta SK, Pathak RK, Jain V, Rashidi P, Suri JS. Human activity recognition in artificial intelligence framework: a narrative review. Artif Intell Rev. 2022;55(6):4755–808.

    Article  Google Scholar 

  28. Qui S, Zhao H, Jiang N, Wang Z, Lui L, An Y, Zhao H, Miao X, Lui R, Fortino G. Multi-sensor information fusion based on machine learning for real applications in human activity recognition: state-of-the-art and research challenges. Inform Fusion. 2022;80:241–65.

    Article  Google Scholar 

  29. Shiranthika C, Premakumara N, Chui HL, Samani H, Shyalika C, Yang CY. Human activity recognition using CNN & LSTM. In 5th International Conference on Information Technology Research (ICITR). 2020; 1–6.

  30. Nweke HF, The YW, Mujtaba G, Al-Garadi MA. Data fusion and multiple classifier systems for human activity detection and health monitoring: review and open research direction. Inform Fus. 2019;46:147–70.

    Article  Google Scholar 

  31. Uddin MA, Lee YK. Feature fusion of deep spatial features and handcrafted spatiotemporal features for human action recognition. Sensors. 2019;19:1599.

    Article  Google Scholar 

  32. Naveed H, Khan G, Khan AU, Siddiqi A, Khan MUG. Human activity recognition using mixture of heterogeneous features and sequential minimal optimization. Int J Mach Learn Cybern. 2019;10:2329–40.

    Article  Google Scholar 

  33. Mahajan RC, Pathare NK, Vyas V. Video-based anomalous activity detection using 3D-CNN and transfer learning. In IEEE 7th International Conference for Convergence in Technology (I2CT). 2022; 1–6.

  34. Zamri NM, Ling GF, Han PY, Yin OS. Vision-based human action recognition on pretrained AlexNet. In 9th IEEE International conference on Control System Computing and Engineering (ICCSCE). 2019; 1–5.

  35. Vishwakarma DK, Dhiman C. A unified model for human activity recognition using spatial distribution of gradients and difference of gaussian kernel. Vis Comput. 2019;35:1595–613.

    Article  Google Scholar 

  36. Huan RH, Xie CJ, Guo F, Chi KK, Mao KJ, Li YL, Pan Y. Human action recognition based on HOIRM feature fusion and AP clustering BOW. PloSone. 2019;14: e0219910.

    Article  Google Scholar 

  37. Nida N, Yousaf MH, Irtaza A, Velastin SA. Video augmentation technique for human action recognition using genetic algorithm. ETRI J. 2022;44:327–38.

    Article  Google Scholar 

  38. Malik NUR, Abu-Baker SAR, Sheikh UU, Channa A, Popescu N. Cascading pose features with CNN-LSTM for multiview human action recognition. Signals. 2023;4:40–55.

    Article  Google Scholar 

  39. Abdelbaky A, Aly S. Two-stream spatiotemporal feature fusion for human action recognition. Vis Comput. 2021;37:1821–35.

    Article  Google Scholar 

  40. Gupta S, Vishwakarma DK, Puri NK. Leveraging human segmentation guided frames in videos for activity recognition. In 6th International Conference on Computing Methodologies and Communication (ICCMC). 2022; 1406–1411.

  41. Verma KK, Singh BM. Deep multi-model fusion for human activity recognition using evolutionary algorithms. Int J Interactive Multimedia Artif Intell. 2021;7(2):44.

    Google Scholar 

  42. Nigam S, Singh R, Singh MK, Singh VK. Multiview human activity recognition using uniform rotation invariant local binary patterns. J Ambient Intell Human Comput. 2022;14(5):4707–25.

    Article  Google Scholar 

  43. Vo VH, Pham HM. Multiple modal features and multiple kernel learning for human daily activity recognition. Sci Technol Develop J. 2018;21:52–63.

    Article  Google Scholar 

  44. Basly H, Quarda W, Sayadi FE, Ouni B, Alimi AM. DTR-HAR deep temporal residual representation for human activity recognition. The Visual Computer. 2022; 1–21.

  45. Ahad MAR. Action datasets and MHI. Motion History Images for Action recognition and understanding. Singapore: Springer; 2013. p. 77–85.

    Google Scholar 

  46. MSR DailyActivity3D. Dr Wanqing Li (UOW). Available: https://sites.google.com/view/wanqingli/data-sets/msr-dailyactivity3d.

  47. Centre for Biometrics and Security Research. Available: http://www.cbsr.ia.ac.cn/english/Action%20Databases%20EN.asp.

  48. Malik Z, Shapiai MIB. Human action interpretation using convolutional neural network: a survey. Mach Vis Appl. 2022;33:1–23.

    Article  Google Scholar 

  49. Islam MS, Okita T, Inoue S. Evaluation of transfer learning for human activity recognition among different datasets. In IEEE International Conference on Dependable Autonomic and Secure Computing International Conference on Pervasive intelligence and Computing Intl Conf on Cloud and Big Data Computing International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). 2019; 854–859.

  50. Wen L, Li X, Gao L. A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Comput Appl. 2020;32:6114–5.

    Article  Google Scholar 

  51. Boesch G. Deep Residual Networks (ResNet ResNet50) - 2024 Guide. visoai. 2023. Available: https://viso.ai/deep-learning/resnet-residual-neural-network/.

  52. Khelalef A, Ababsa F, Benoudjit N. An efficient human activity recognition technique based on deep learning. Pattern Recognit Image Anal. 2019;29:702–15.

    Article  Google Scholar 

  53. Feng W, Feng Y. Research on action recognition based on deep learning with long short-term memory network and attention mechanism. Wirel Commun Mobile Comput. 2022;2022:1–9.

    Google Scholar 

  54. Kumar R, Sagar LK, Awasthi S. Human activity recognition from video clip. Intelligent Computing in Engineering. 2020; 269–274.

  55. Jaouedi N, Boujnah N, Bouhlel MS. A new hybrid deep learning model for human action recognition. J King Saud Univ Comp Inform Sci. 2020;32:447–53.

    Google Scholar 

  56. Basha SH, Pulabaigari V, Mukherjee S. An information-rich sampling technique over spatio-temporal CNN for classification of human action in videos. Multimedia Tools Appl. 2022;81:40431–49.

    Article  Google Scholar 

  57. Roselind Johnson D, Uthariaraj VR. A novel parameter initialization technique using RBM-NN for human action recognition. Comput Intell Neurosci. 2020;2020:1–30.

    Article  Google Scholar 

  58. Garg A, Nigam S, Singh R. Vision-based human activity recognition using hybrid deep learning In IEEE International Conference on Connected Systems and Intelligence (CSI). 2022; 1- 6.

  59. Han PY, Yee KE, Yin OS. Localised representation in human action recognition. In Proceedings of the 2018 VII International Conferences on Network Communication and Computing. 2018; 261–266.

  60. Khater S, Hadhoud M, Fayak MB. A novel human activity recognition architecture: using residual inception ConvLSTM layer. J Eng Appl Sci. 2022;69:1–16.

    Article  Google Scholar 

  61. Patel CI, Labana D, Pandya S, Modi K, Ghayvat H, Awais M. Histogram of oriented gradient-based fusion of features for human action recognition in action video sequences. Sensors. 2020;20:7299.

    Article  Google Scholar 

  62. Snoun A, Jlidi N, Bouchrika T, Jemai O, Zaied M. Towards a deep human activity recognition approach based on video to image transformation with skeleton data. Multimedia Tools Appl. 2021;80:29675–98.

    Article  Google Scholar 

  63. D’ Sa G, Prasad BG. An IoT based framework for activity recognition using deep learning technique. arXiv preprint arXiv: 190607247. 2019.

  64. Berlin SJ, John M. Particle swarm optimization with deep learning for human action recognition. Multimedia Tools Appl. 2020;79:17349–71.

    Article  Google Scholar 

  65. Abdelbaky A, Aly S. Human action recognition using three orthogonal planes with unsupervised deep convolutional neural network. Multimedia Tools Appl. 2021;80:20019–43.

    Article  Google Scholar 

  66. Hua G, Hemantha Kumar G, Manjunath Aradhya VN. A hybrid speed and radial distance feature descriptor using optical flow approach in HAR. Applied Intelligence and Informatics: Second International Conference All 2022 Proceedings. Springer Nature Switzerland. 2023; 1-13.

  67. Nida N, Yousaf MH, Irtaza A, Velastin SA. Instructor activity recognition through deep spatiotemporal features and feedforward extreme learning machine. Math Prob Eng. 2019. https://doi.org/10.1155/2019/2474865.

    Article  Google Scholar 

  68. Nadeem A, Jalal A, Kim K. Automatic human pose estimation for sport activity recognition with robust body part detection and entropy markov model. Multimedia Tools Appl. 2021;80:21465–98.

    Article  Google Scholar 

  69. Mishra O, Kavimandan PS, Tripathi MM, Kapoor R, Yadav K. Human action recognition using a new hybrid descriptor. In Advances in VLSI Communication and Signal Processing. 2021; 527–536.

  70. Nida N, Yousaf MH, Irtaza A, Velastin SA. Deep temporal motion descriptor (DTMD) for human action recognition. Turk J Electr Eng Comput Sci. 2020;28:1371–85.

    Article  Google Scholar 

  71. Goyal G, Noceti N, Odone F. Single view learning in action recognition. In 25th International Conference on Pattern Recognition (ICPR). 2021; 3690–3697.

  72. Marshella A, Goyal G, Odone F. Adversarial feature refinement for cross-view action recognition. In Proceedings of the 36th Annual ACM Symposium on Applied Computing. 2021; 1046–1054.

  73. Naeem HB, Murtaza F, Yousaf MH, Velastin SA. T-VALD: Temporal vector of locally aggregated descriptor for multiview human action recognition. Pattern Recogn Lett. 2021;148:22–8.

    Article  Google Scholar 

  74. Xu C, Wu X, Li Y, Jin Y, Wang M, Lui Y. Cross-modality online distillation for multi-view action recognition. Neurocomputing. 2021;456:384–93.

    Article  Google Scholar 

  75. Malik NUR, Sheikh UU, Abu-Baker SAR, Channa A. Multi-view human action recognition using skeleton based-FineKNN with extraneous frame scrapping technique. Sensors. 2023;23:2745.

    Article  Google Scholar 

  76. Zhang J, Bai F, Zhao J, Song Z. Multi-views action recognition on 3D ResNet-LSTM framework. In IEEE 2nd International Conference on Big Data Artificial Intelligence Internet of Things Engineering (ICBAIE). 2021; 289–293.

  77. Nigam S, Singh R, Singh MK, Singh VK. Multiple views based recognition of human activities using uniform patterns. In 6th International Conference on Image Information Processing (ICIIP) 2021; 6: 483–488.

  78. Basly H, Ouarda W, Sayadi FE, Ouni B, Alimi AM. CNN-SVM learning approach based human activity recognition. In International Conference on Image and Signal Processing. 2020; 271–281.

  79. Debnath B, O’Brient M, Kumar S, Behera A. Attention-driven body pose encoding for human activity recognition. In 25th International Conference on Pattern Recognition (ICPR). 2021; 5897–5904.

  80. Islam MS, Bakhat K, Khan R, Iqbal M, Islam MM, Ye Z. Action recognition using interrelationships of 3D joints and frames based on angle sine relation and distance features using interrelationships. Appl Intell. 2021;51:6001–13.

    Article  Google Scholar 

  81. Lui A, Xu N, Nie WZ, Su YT, Zhang YD. Multi-domain and multi-task learning for human action recognition. In IEEE Trans Image Process. 2018;28:853–67.

    MathSciNet  Google Scholar 

  82. Singh T, Vishwakarma DK. A deeply coupled ConvNet for human activity recognition using dynamic and RGB images. Neural Comput Appl. 2021;33:469–85.

    Article  Google Scholar 

  83. Khowaja SA, Lee SL. Skeleton-based human action recognition with sequential convolutional-LSTM networks and fusion strategies. J Ambient Intell Human Comput. 2022;13(8):3729–46.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajiv Singh.

Ethics declarations

Conflict of Interest

The authors declare that there is no conflict of interest regarding this manuscript and received no funding for this research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garg, A., Nigam, S. & Singh, R. An Intermediate Deep Feature Fusion Approach for Understanding Human Activities from Image Sequences. SN COMPUT. SCI. 5, 1037 (2024). https://doi.org/10.1007/s42979-024-03345-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-024-03345-8

Keywords