Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study
<p>Illustrating 3D human skeleton data of the KLHA3D102 [<a href="#B14-sensors-23-05121" class="html-bibr">14</a>] and KLYOGA3D [<a href="#B14-sensors-23-05121" class="html-bibr">14</a>] datasets.</p> "> Figure 2
<p>The left shows the proportion of HAR research based on vision information, and the right shows the percentage of input information for HAR from video or skeleton.</p> "> Figure 3
<p>Illustrating four deep learning-based methods of HAR based on 3D human skeleton data.</p> "> Figure 4
<p>Illustrating the RNN-based approach for HAR based on 3D human skeleton [<a href="#B6-sensors-23-05121" class="html-bibr">6</a>]. (<b>a</b>) is the skeleton representation as a function of the temporal-spatial, (<b>b</b>) is the skeleton representation according to the tree structure.</p> "> Figure 5
<p>Illustrating a new coordinate system and representation of feature vectors based on joints of human body parts.</p> "> Figure 6
<p>Illustrating the improved model of Logsig-RNN compared to RNN [<a href="#B40-sensors-23-05121" class="html-bibr">40</a>].</p> "> Figure 7
<p>Illustrating CNN-based HAR [<a href="#B6-sensors-23-05121" class="html-bibr">6</a>]. (<b>a</b>) represents the feature types in the image space, and (<b>b</b>) represents the process of projecting the data of the 3D skeleton into the image space.</p> "> Figure 8
<p>Illustrating feature extraction of GCN-based methods [<a href="#B6-sensors-23-05121" class="html-bibr">6</a>].</p> "> Figure 9
<p>Statistics on the number of value studies based on GCN in the past four years.</p> "> Figure 10
<p>The illustration of the HD-GCN architecture [<a href="#B76-sensors-23-05121" class="html-bibr">76</a>].</p> "> Figure 11
<p>The InfoGCN framework [<a href="#B79-sensors-23-05121" class="html-bibr">79</a>].</p> "> Figure 12
<p>The DSTA-Net framework [<a href="#B84-sensors-23-05121" class="html-bibr">84</a>]. (<b>a</b>) is to compute the attention maps frame by frame, (<b>b</b>) is the calculation of the relations of two joints between all of the frames, (<b>c</b>) is a compromise.</p> "> Figure 13
<p>Illustrating the KLHA3D-102 and KLYOGA3D datasets.</p> "> Figure 14
<p>Illustrating the training and testing results on the KLHA3D-102 and KLYOGA3D datasets. (<b>a</b>) is the accuracy of the model on the train set and the test set of Conf.1 on the KLHA3D-102 dataset. (<b>b</b>) is the accuracy of the model on the train set and the test set of Conf.2 on the KLHA3D-102 dataset. (<b>c</b>) is the accuracy of the model on the train set and the test set of Conf.1 on the KLYOGA3D dataset. (<b>d</b>) is the accuracy of the model on the train set and the test set of Conf.2 on the KLYOGA3D dataset.</p> "> Figure 15
<p>3D human skeleton illustration of “drinking tea” and “drinking wate” actions on the KLHA3D-102 dataset.</p> "> Figure 16
<p>Illustrating application of 3D human pose estimation, activity recognition, and total distance traveled of joints on the human body.</p> ">
Abstract
:1. Introduction
- We present an overview of the HAR problem based on the 3D human pose as the input, with four types of DNN to perform the estimation: RNN-based, CNN-based, GCN-based, and Hybrid–DNN-based.
- A full survey of HAR based on the 3D human pose is elaborated in detail from methods, datasets, and recognition results. More specifically, our survey provided about 250 results of the HAR across more than 70 valuable studies from 2019 to March 2023. The results are listed in ascending order of the year and are all evaluated on the accuracy measure.
- Analysis of challenges in HAR based on the 3D skeleton of the whole body is presented. The analysis of the challenges of implementing HAR with two main contents is the number of dimensions of the data and the insufficient information to distinguish actions with a limited number of reference points.
2. Related Works
3. HAR Based on 3D Human Pose: Survey
3.1. Methods
3.1.1. RNN-Based
3.1.2. CNN-Based
3.1.3. GCN-Based
3.1.4. Hybrid-DNN
3.2. Datasets
3.3. Evaluation Matrices
- Accuracy (Acc):
3.4. Literature Results
3.5. Challenges and Discussion
4. Comparative Study of HAR
4.1. Experiment
4.2. Results and Discussion
5. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gammulle, H.; Ahmedt-Aristizabal, D.; Denman, S.; Tychsen-Smith, L.; Petersson, L.; Fookes, C. Continuous Human Action Recognition for Human-Machine Interaction: A Review. In ACM Computing Surveys; 2022; pp. 1–31. Available online: https://arxiv.org/pdf/2202.13096.pdf (accessed on 27 April 2023).
- Niu, W.; Long, J.; Han, D.; Wang, Y.F. Human activity detection and recognition for video surveillance. In Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME), Taipei, China, 27–30 June 2004; Volume 1, pp. 719–722. [Google Scholar] [CrossRef]
- Wu, F.; Wang, Q.; Bian, J.; Ding, N.; Lu, F.; Cheng, J.; Dou, D.; Xiong, H. A Survey on Video Action Recognition in Sports: Datasets, Methods and Applications. IEEE Trans. Multimed. 2022, 1–26. [Google Scholar] [CrossRef]
- Wen, J.; Guillen, L.; Abe, T.; Suganuma, T. A hierarchy-based system for recognizing customer activity in retail environments. Sensors 2021, 21, 4712. [Google Scholar] [CrossRef] [PubMed]
- Islam, M.M.; Nooruddin, S.; Karray, F.; Muhammad, G. Human activity recognition using tools of convolutional neural networks: A state of the art review, data sets, challenges, and future prospects. Comput. Biol. Med. 2022, 149, 106060. [Google Scholar] [CrossRef]
- Xing, Y.; Zhu, J. Deep learning-based action recognition with 3D skeleton: A survey. Caai Trans. Intell. Technol. 2021, 6, 80–92. [Google Scholar] [CrossRef]
- Ren, B.; Liu, M.; Ding, R.; Liu, H. A Survey on 3D Skeleton-Based Action Recognition Using Learning Method. arXiv 2020, arXiv:2002.05907. [Google Scholar]
- Arshad, M.H.; Bilal, M.; Gani, A. Human Activity Recognition: Review, Taxonomy and Open Challenges. Sensors 2022, 22, 6463. [Google Scholar] [CrossRef]
- Müller, M.; Röder, T.; Clausen, M.; Eberhardt, B.; Krüger, B.; Weber, A. Documentation Mocap Database HDM05; Technical Report CG-2007-2; Universität Bonn: Bonn, Germany, 2007. [Google Scholar]
- Barnachon, M.; Bouakaz, S.; Boufama, B.; Guillou, E. Ongoing human action recognition with motion capture. Pattern Recognit. 2014, 47, 238–247. [Google Scholar] [CrossRef]
- Shahroudy, A.; Liu, J.; Ng, T.T.; Wang, G. NTU RGB + D: A large scale dataset for 3D human activity analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1010–1019. [Google Scholar]
- Liu, J.; Shahroudy, A.; Perez, M.; Wang, G.; Duan, L.Y.; Kot, A.C. NTU RGB + D 120: A large-scale benchmark for 3D human activity understanding. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2684–2701. [Google Scholar] [CrossRef]
- Oreifej, O.; Liu, Z. HON4D: Histogram of oriented 4D normals for activity recognition from depth sequences. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 716–723. [Google Scholar] [CrossRef]
- Kumar, M.T.K.; Kishore, P.V.; Madhav, B.T.; Kumar, D.A.; Kala, N.S.; Rao, K.P.K.; Prasad, B. Can Skeletal Joint Positional Ordering Influence Action Recognition on Spectrally Graded CNNs: A Perspective on Achieving Joint Order Independent Learning. IEEE Access 2021, 9, 139611–139626. [Google Scholar] [CrossRef]
- Wang, Z.; Zheng, Y.; Yang, Y.; Li, Y.; Zhang, M. Deep Neural Networks in Video Human Action Recognition: A review. arXiv 2023, arXiv:2208.03775. [Google Scholar]
- Yan, S.; Xiong, Y.; Lin, D. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, LA, USA, 2–7 February 2018; pp. 7444–7452. [Google Scholar]
- Mahmoodi, J.; Nezamabadi-pour, H.; Abbasi-Moghadam, D. Violence detection in videos using interest frame extraction and 3D convolutional neural network. Multimed. Tools Appl. 2022, 81, 20945–20961. [Google Scholar] [CrossRef]
- Diba, A.; Fayyaz, M.; Sharma, V.; Karami, A.H.; Arzani, M.M.; Yousefzadeh, R.; Van Gool, L. Temporal 3D ConvNets using temporal transition layer. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1230–1234. [Google Scholar]
- Morshed, M.G.; Sultana, T.; Alam, A.; Lee, Y.K. Human Action Recognition: A Taxonomy-Based Survey, Updates, and Opportunities. Sensors 2023, 23, 2182. [Google Scholar] [CrossRef] [PubMed]
- Yang, X.; Tian, Y. Effective 3D action recognition using EigenJoints. J. Vis. Commun. Image Represent. 2014, 25, 2–11. [Google Scholar] [CrossRef]
- Jobanputra, C.; Bavishi, J.; Doshi, N. Human Activity Recognition: A Survey. In Proceedings of the Procedia Computer Science; Elsevier: Halifax, NS, Canada, 2019; Volume 155, pp. 698–703. [Google Scholar] [CrossRef]
- Gupta, N.; Gupta, S.K.; Pathak, R.K.; Jain, V.; Rashidi, P.; Suri, J.S. Human activity recognition in artificial intelligence framework: A narrative review. Artif. Intell. Rev. 2022, 55, 4755–4808. [Google Scholar] [CrossRef] [PubMed]
- Carvalho, L.I.; Sofia, R.C. A Review on Scaling Mobile Sensing Platforms for Human Activity Recognition: Challenges and Recommendations for Future Research. IoT 2020, 1, 451–473. [Google Scholar] [CrossRef]
- Beddiar, D.R.; Nini, B.; Sabokrou, M.; Hadid, A. Vision-based human activity recognition: A survey. Multimed. Tools Appl. 2020, 79, 30509–30555. [Google Scholar] [CrossRef]
- Dayanand, K.; Atherton, O.E.; Tackett, J.L.; Ferrer, E.; Robins, R.W. Deep learning for RFID-based activity recognition. In Proceedings of the 14th ACM conference on Embedded Networked Sensor Systems SenSys, Stanford, CA, USA, 14–16 November 2016; Volume 176, pp. 139–148. [Google Scholar]
- Han, J.; Ding, H.; Qian, C.; Ma, D.; Xi, W.; Wang, Z.; Jiang, Z.; Shangguan, L. CBID: A customer behavior identification system using passive tags. In Proceedings of the International Conference on Network Protocols, ICNP, Raleigh, NC, USA, 21–24 October 2014; pp. 47–58. [Google Scholar] [CrossRef]
- Wang, J.; Chen, Y.; Hao, S.; Peng, X.; Hu, L. Deep learning for sensor-based activity recognition: A survey. Pattern Recognit. Lett. 2019, 119, 3–11. [Google Scholar] [CrossRef]
- Wang, F.; Feng, J.; Zhao, Y.; Zhang, X.; Zhang, S.; Han, J. Joint activity recognition and indoor localization with WiFi fingerprints. IEEE Access 2019, 7, 80058–80068. [Google Scholar] [CrossRef]
- Hussain, Z.; Sheng, Q.Z.; Zhang, W.E. A review and categorization of techniques on device-free human activity recognition. J. Netw. Comput. Appl. 2020, 167, 102738. [Google Scholar] [CrossRef]
- Le, V.H. Deep learning-based for human segmentation and tracking, 3D human pose estimation and action recognition on monocular video of MADS dataset. In Multimedia Tools and Applications; Springer: Berlin/Heidelberg, Germany, 2022. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Lee, D.; Lim, M.; Park, H.; Kang, Y.; Park, J.S.; Jang, G.J.; Kim, J.H. Long short-term memory recurrent neural network-based acoustic model using connectionist temporal classification on a large-scale training corpus. China Commun. 2017, 14, 23–31. [Google Scholar] [CrossRef]
- Guan, M.; Cho, S.; Petro, R.; Zhang, W.; Pasche, B.; Topaloglu, U. Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes. JAMIA Open 2019, 2, 139–149. [Google Scholar] [CrossRef] [PubMed]
- Saha, B.N.; Senapati, A. Long Short Term Memory (LSTM) based Deep Learning for Sentiment Analysis of English and Spanish Data. In Proceedings of the 2020 International Conference on Computational Performance Evaluation, ComPE 2020, Shillong, India, 2–4 July 2020; pp. 442–446. [Google Scholar] [CrossRef]
- Oruh, J.; Viriri, S.; Adegun, A. Long Short-Term Memory Recurrent Neural Network for Automatic Speech Recognition. IEEE Access 2022, 10, 30069–30079. [Google Scholar] [CrossRef]
- Deep historical long short-term memory network for action recognition. Neurocomputing 2020, 407, 428–438. [CrossRef]
- Gaur, D.; Dubey, S.K. Development of Activity Recognition Model using LSTM-RNN Deep Learning Algorithm. J. Inf. Organ. Sci. 2022, 46, 277–291. [Google Scholar] [CrossRef]
- Ye, L.; Ye, S. Deep learning for skeleton-based action recognition. J. Phys. Conf. Ser. 2021, 1883, 012174. [Google Scholar] [CrossRef]
- Li, S.; Li, W.; Cook, C.; Gao, Y. Deep Independently Recurrent Neural Network (IndRNN). arXiv 2019, arXiv:1910.06251. [Google Scholar]
- Liao, S.; Lyons, T.; Yang, W.; Ni, H. Learning stochastic differential equations using RNN with log signature features. arXiv 2019, arXiv:1908.08286. [Google Scholar]
- Liao, S.; Lyons, T.; Yang, W.; Schlegel, K.; Ni, H. Logsig-RNN: A novel network for robust and efficient skeleton-based action recognition. In Proceedings of the British Machine Vision Conference, London, UK, 21–24 November 2021. [Google Scholar]
- Tasnim, N.; Islam, M.; Baek, J.H. Deep learning-based action recognition using 3D skeleton joints information. Inventions 2020, 5, 49. [Google Scholar] [CrossRef]
- Li, M.; Sun, Q. 3D Skeletal Human Action Recognition Using a CNN Fusion Model. Math. Probl. Eng. 2021, 2021, 6650632. [Google Scholar] [CrossRef]
- Duan, H.; Chen, K.; Lin, D.; Dai, B. Revisiting Skeleton-based Action Recognition. In Proceedings of the CVPR, New Orleans, LA, USA, 18–24 June 2022; pp. 2969–2978. [Google Scholar]
- Koniusz, P.; Wang, L.; Cherian, A. Tensor Representations for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 648–665. [Google Scholar] [CrossRef] [PubMed]
- Simonyan, K.; Zisserman, A. Two-stream convolutional networks for action recognition in videos. In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014; Volume 1, pp. 568–576. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H.; Member, S. Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks. In Proceedings of the IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), Long Beach, CA, USA, 16–20 June 2019; pp. 1–13. [Google Scholar]
- Peng, W.; Hong, X.; Chen, H.; Zhao, G. Learning graph convolutional network for skeleton-based human action recognition by neural searching. In Proceedings of the AAAI 2020-34th AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–14 February 2020; pp. 2669–2676. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Skeleton-based action recognition with directed graph neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7904–7913. [Google Scholar] [CrossRef]
- Li, M.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Tian, Q. Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Ding, X.; Yang, K.; Chen, W. An attention-enhanced recurrent graph convolutional network for skeleton-based action recognition. In Proceedings of the ACM International Conference Proceeding Series, Beijing, China, 25–28 October 2019; pp. 79–84. [Google Scholar] [CrossRef]
- Gao, J.; He, T.; Zhou, X.; Ge, S. Focusing and Diffusion: Bidirectional Attentive Graph Convolutional Networks. arXiv 2019, arXiv:1912.11521. [Google Scholar]
- Li, M.; Member, S.; Chen, S.; Chen, X.; Zhang, Y.; Wang, Y.; Fellow, Q.T. Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3316–3333. [Google Scholar] [CrossRef]
- Wu, C. Spatial Residual Layer and Dense Connection Block Enhanced Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Non-Local Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv 2019, arXiv:1805.07694. [Google Scholar]
- Papadopoulos, K.; Ghorbel, E.; Aouada, D.; Ottersten, B. Vertex Feature Encoding and Hierarchical Temporal Modeling in a Spatial-Temporal Graph Convolutional Network for Action Recognition. arXiv 2019, arXiv:1912.09745. [Google Scholar]
- Kao, J.Y.; Ortega, A.; Tian, D.; Mansour, H.; Vetro, A. Graph Based Skeleton Modeling for Human Activity Analysis. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019. [Google Scholar]
- Wang, P.; Cao, Y.; Shen, C.; Liu, L.; Shen, H.T. Temporal Pyramid Pooling Based Convolutional Neural Network for Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 2613–2622. [Google Scholar] [CrossRef]
- Song, Y.F.; Zhang, Z.; Shan, C.; Wang, L. Stronger, Faster and More Explainable: A Graph Convolutional Baseline for Skeleton-based Action Recognition. In Proceedings of the MM 2020—Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1625–1633. [Google Scholar]
- Cheng, K.; Zhang, Y.; He, X.; Chen, W.; Cheng, J.; Lu, H. Skeleton-Based Action Recognition with Shift Graph Convolutional Network. In Proceedings of the The IEEE/CVF Computer Vision and Pattern Recognition Conference, Seattle, WA, USA, 13–19 June 2020; pp. 524–528. [Google Scholar] [CrossRef]
- Song, Y.f.; Zhang, Z. Richly Activated Graph Convolutional Network for Robust Skeleton-based Action Recognition. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 1915–1925. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, H.; Chen, Z.; Wang, Z.; Ouyang, W. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition. In Proceedings of the The IEEE/CVF Computer Vision and Pattern Recognition Conference, Seattle, WA, USA, 13–19 June 2020; pp. 143–152. [Google Scholar]
- Ye, F.; Pu, S.; Zhong, Q.; Li, C.; Xie, D.; Tang, H. Dynamic GCN: Context-enriched Topology Learning for Skeleton-based Action Recognition. In Proceedings of the MM’20: Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020. [Google Scholar]
- Obinata, Y.; Yamamoto, T. Temporal Extension Module for Skeleton-Based Action Recognition. arXiv 2020, arXiv:2003.08951. [Google Scholar]
- Yang, H.; Gu, Y.; Zhu, J. PGCN-TCA: Pseudo Graph Convolutional Network With Temporal and Channel-Wise Attention for Skeleton-Based Action Recognition. IEEE Access 2020, 8, 10040–10047. [Google Scholar] [CrossRef]
- Ding, X.; Yang, K.; Chen, W. A Semantics-Guided Graph Convolutional Network for Skeleton-Based Action Recognition. In Proceedings of the 4th International Conference on Innovation in Artificial Intelligence, Xiamen, China, 8–11 May 2020; pp. 130–136. [Google Scholar]
- Yu, J.; Yoon, Y.; Jeon, M. Predictively Encoded Graph Convolutional Network for Noise-Robust Skeleton-based Action Recognition. Appl. Intell. 2020, 52, 2317–2331. [Google Scholar]
- Li, S.; Yi, J.; Farha, Y.A.; Gall, J. Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition. arXiv 2020, arXiv:2010.07367. [Google Scholar] [CrossRef]
- Chen, T.; Zhou, D.; Wang, J.; Wang, S.; Guan, Y.; He, X.; Ding, E. Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition. In Proceedings of the MM ’21: Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021. [Google Scholar]
- Yang, D.; Li, M.M.; Fu, H.; Fan, J.; Zhang, Z.; Leung, H. Unifying Graph Embedding Features with Graph Convolutional Networks for Skeleton-based Action Recognition. arXiv 2020, arXiv:2003.03007. [Google Scholar]
- Chen, Y.; Zhang, Z.; Yuan, C.; Li, B.; Deng, Y.; Hu, W. Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 20–25 June 2021; pp. 13359–13368. [Google Scholar]
- Qin, Z.; Liu, Y.; Ji, P.; Kim, D.; Wang, L.; Member, S.; Mckay, R.I.; Anwar, S.; Gedeon, T.; Member, S. Fusing Higher-Order Features in Graph Neural Networks for Skeleton-Based Action Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–15. [Google Scholar] [CrossRef] [PubMed]
- Zeng, A.; Sun, X.; Yang, L.; Zhao, N.; Liu, M.; Xu, Q. Learning Skeletal Graph Neural Networks for Hard 3D Pose Estimation. In Proceedings of the International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 11436–11445. [Google Scholar]
- Song, Y.f.; Zhang, Z. Constructing Stronger and Faster Baselines for Skeleton-based Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 45, 1474–1488. [Google Scholar] [CrossRef]
- Yang, H.; Yan, D.; Zhang, L.; Li, D.; Sun, Y.; You, S.; Mar, C.V. Feedback Graph Convolutional Network for Skeleton-based Action Recognition. IEEE Trans. Image Process. 2021, 31, 164–175. [Google Scholar] [CrossRef]
- Lee, J.; Lee, M.; Lee, D.; Lee, S. Hierarchically Decomposed Graph Convolutional Networks for Skeleton-Based Action Recognition. arXiv 2022, arXiv:2208.10741. [Google Scholar]
- Hu, L.; Liu, S.; Feng, W. Spatial Temporal Graph Attention Network for Skeleton-Based Action Recognition. arXiv 2022, arXiv:2208.08599. [Google Scholar]
- Duan, H.; Wang, J.; Chen, K.; Lin, D. PYSKL: Towards Good Practices for Skeleton Action Recognition. arXiv 2022, arXiv:2205.09443. [Google Scholar]
- Chi, H.g.; Ha, M.H.; Chi, S.; Lee, S.W.; Huang, Q.; Ramani, K. InfoGCN: Representation Learning for Human Skeleton-based Action Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 20186–20196. [Google Scholar]
- Wang, S.; Zhang, Y.; Zhao, M.; Qi, H.; Wang, K.; Wei, F.; Jiang, Y. Skeleton-based Action Recognition via Temporal-Channel Aggregation. arXiv 2022, arXiv:2205.15936. [Google Scholar]
- Si, C.; Chen, W.; Wang, W.; Wang, L.; Tan, T. An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 1227–1236. [Google Scholar]
- Zhao, R.; Wang, K.; Su, H.; Ji, Q. Bayesian Graph Convolution LSTM for Skeleton Based Action Recognition. In Proceedings of the International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6882–6892. [Google Scholar]
- Liu, J.; Shahroudy, A.; Wang, G.; Duan, L.Y. Skeleton-Based Online Action Prediction Using Scale Selection Network. arXiv 2019, arXiv:1902.03084. [Google Scholar] [CrossRef]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Decoupled Spatial-Temporal Attention Network for Skeleton-Based Action-Gesture Recognition. In Proceedings of the Asian Conference on Computer Vision (ACCV), Kyoto, Japan, 30 November–4 December 2020. [Google Scholar]
- Zhang, P.; Lan, C.; Zeng, W.; Xing, J.; Xue, J.; Zheng, N. Semantics-guided neural networks for efficient skeleton-based human action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1109–1118. [Google Scholar]
- Plizzaria, C.; Cannicia, M.; Matteuccia, M. Skeleton-based Action Recognition via Spatial and Temporal Transformer Networks. Comput. Vis. Image Underst. 2021, 208, 103219. [Google Scholar] [CrossRef]
- Xiang, W.; Li, C.; Zhou, Y.; Wang, B.; Zhang, L. Language Supervised Training for Skeleton-based Action Recognition. arXiv 2022, arXiv:2208.05318. [Google Scholar]
- Trivedi, N.; Kiran, R.S. PSUMNet: Unified Modality Part Streams are All You Need for Efficient Pose-based Action Recognition. arXiv 2022, arXiv:2208.05775. [Google Scholar]
- Zhou, Y.; Cheng, Z.q.; Li, C.; Fang, Y.; Geng, Y.; Xie, X.; Keuper, M. Hypergraph Transformer for Skeleton-based Action Recognition. arXiv 2023, arXiv:2211.09590. [Google Scholar]
- Bavil, A.F.; Damirchi, H.; Taghirad, H.D.; Member, S. Action Capsules: Human Skeleton Action Recognition. arXiv 2023, arXiv:2301.13090. [Google Scholar] [CrossRef]
- Xia, L.; Chen, C.; Aggarwal, J. View invariant human action recognition using histograms of 3D joints. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW), IEEE Computer Society Conference, Providence, RI, USA, 16–21 June 2012; pp. 20–27. [Google Scholar]
- Yun, K.; Honorio, J.; Chattopadhyay, D.; Berg, T.L.; Samaras, D.; Brook, S. Two-person Interaction Detection Using Body-Pose Features and Multiple Instance Learning. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June 2012. [Google Scholar]
- Seidenari, L.; Varano, V.; Berretti, S.; Bimbo, A.D.; Pala, P. Recognizing Actions from Depth Cameras as Weakly Aligned Multi-Part. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Portland, OR, USA, 23–28 June 2013. [Google Scholar] [CrossRef]
- Jhuang, H.; Gall, J.; Zuffi, S.; Schmid, C.; Black, M.J. Towards understanding action recognition. In Proceedings of the International Conf. on Computer Vision (ICCV), Sydney, Australia, 1–8 December 2013; pp. 3192–3199. [Google Scholar]
- Instit, K.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563. [Google Scholar]
- Wang, J.; Nie, X. Cross-view Action Modeling, Learning and Recognition. arXiv 2014, arXiv:1405.2941. [Google Scholar]
- Hu, J.F.; Zheng, W.S.; Lai, J.; Zhang, J. Jointly Learning Heterogeneous Features for RGB-D Activity Recognition. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P.; et al. The Kinetics Human Action Video Dataset. arXiv 2017, arXiv:1705.06950. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kishore, P.V.V.; Member, S.; Kumar, D.A.; Sastry, A.S.C.S.; Kumar, E.K. Motionlets Matching with Adaptive Kernels for 3D Indian Sign Language Recognition. IEEE Sens. J. 2018, 1748, 1–11. [Google Scholar] [CrossRef]
- Wen, Y.H.; Gao, L.; Fu, H.; Zhang, F.L.; Xia, S. Graph CNNs with Motif and Variable Temporal Block for Skeleton-Based Action Recognition. In Proceedings of the AAAI, Hilton, HI, USA, 27 January–1 February 2019. [Google Scholar]
- Shi, L.; Zhang, Y.; Cheng, J.; Lu, H. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Liang, D.; Fan, G.; Lin, G.; Chen, W.; Pan, X.; Zhu, H. Three-stream convolutional neural network with multi-task and ensemble learning for 3d action recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–17 June 2019; pp. 934–940. [Google Scholar] [CrossRef]
- Cho, S.; Maqbool, M.H.; Liu, F.; Foroosh, H. Self-Attention Network for Skeleton-based Human Action Recognition. arXiv 2019, arXiv:1912.08435. [Google Scholar]
- Li, T.; Fan, L.; Zhao, M.; Liu, Y.; Katabi, D. Making the Invisible Visible: Action Recognition Through Walls and Occlusions. In Proceedings of the ICCV, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 872–881. [Google Scholar]
- Wang, L.; Huynh, D.Q.; Member, S.; Koniusz, P. A Comparative Review of Recent Kinect-based Action Recognition Algorithms. IEEE Trans. Image Process. 2019, 29, 15–28. [Google Scholar] [CrossRef]
- Zhang, P.; Xue, J.; Lan, C.; Zeng, W.; Zheng, N. EleAtt-RNN: Adding Attentiveness to Neurons in Recurrent Neural Networks. IEEE Trans. Image Process. 2019, 29, 1061–1073. [Google Scholar] [CrossRef]
- Caetano, C.; Sena, J.; Brmond, F.; Schwartz, W.R.; Antipolis, S. SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition. arXiv 2019, arXiv:1907.13025. [Google Scholar]
- Caetano, C.; Schwartz, W.R. Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints. arXiv 2019, arXiv:1909.05704. [Google Scholar]
- Wang, M.; Ni, B.; Yang, X. Learning Multi-View Interactional Skeleton Graph for Action Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 45, 6940–6954. [Google Scholar] [CrossRef] [PubMed]
- Cheng, K.; Zhang, Y.; Cao, C.; Shi, L.; Cheng, J.; Lu, H. Decoupling GCN with DropGraph Module for Skeleton-Based Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar]
- Fan, Y.; Weng, S.; Zhang, Y.; Shi, B.; Zhang, Y. Context-Aware Cross-Attention for Skeleton- Based Human Action Recognition. IEEE Access 2020, 8, 15280–15290. [Google Scholar] [CrossRef]
- Memmesheimer, R.; Theisen, N.; Paulus, D. Gimme Signals: Discriminative signal encoding for multimodal activity recognition. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2020; IEEE: Las Vegas, NV, USA, 2020. [Google Scholar] [CrossRef]
- Xu, K.; Ye, F.; Zhong, Q.; Xie, D. Topology-Aware Convolutional Neural Network for Efficient Skeleton-Based Action Recognition. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022, Vancouver, BC, Canada, 8 August 2022; Volume 36, pp. 2866–2874. [Google Scholar] [CrossRef]
- Li, B.; Li, X.; Zhang, Z.; Wu, F. Spatio-temporal graph routing for skeleton-based action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 29–31 January 2019; Volume 33, pp. 8561–8568. [Google Scholar] [CrossRef]
- Hachiuma, R.; Sato, F.; Sekii, T. Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling. arXiv 2023, arXiv:2303.15270. [Google Scholar]
- Davoodikakhki, M.; Yin, K.K. Hierarchical Action Classification with Network Pruning. Lect. Notes Comput. Sci. Incl. Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinform. 2020, 12509, 291–305. [Google Scholar] [CrossRef]
- Yan, A.; Wang, Y.; Li, Z.; Qiao, Y. PA3D: Pose-action 3D machine for video recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7914–7923. [Google Scholar] [CrossRef]
- McNally, W.; Wong, A.; McPhee, J. STAR-Net: Action recognition using spatiooral activation reprojection. In Proceedings of the 2019 16th Conference on Computer and Robot Vision, CRV 2019, Kingston, QC, Canada, 29–31 May 2019; pp. 49–56. [Google Scholar] [CrossRef]
- Yang, F.; Sakti, S.; Wu, Y.; Nakamura, S. Make Skeleton-based Action Recognition Model Smaller, Faster and Better. In Proceedings of the ACM International Conference on Multimedia in Asia, Nice, France, 21–25 October 2019. [Google Scholar]
- Ludl, D.; Gulde, T.; Curio, C. Simple yet efficient real-time pose-based action recognition. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019, Auckland, NZ, USA, 27–30 October 2019; pp. 581–588. [Google Scholar] [CrossRef]
- Ke, Q.; Bennamoun, M.; Rahmani, H.; An, S.; Sohel, F.; Boussaid, F. Learning Latent Global Network for Skeleton-Based Action Prediction. IEEE Trans. Image Process. 2020, 29, 959–970. [Google Scholar] [CrossRef]
- Li, W.; Zhang, Z.; Liu, Z. Action recognition based on a bag of 3D points. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, CVPRW 2010, San Francisco, CA, USA, 13–18 June 2010; Volume 2010, pp. 9–14. [Google Scholar] [CrossRef]
- Paoletti, G.; Cavazza, J.; Beyan, C.; Del Bue, A. Subspace clustering for action recognition with covariance representations and temporal pruning. In Proceedings of the International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 6035–6042. [Google Scholar] [CrossRef]
- Mazari, A.; Sahbi, H. MLGCN: Multi-laplacian graph convolutional networks for human action recognition. In Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019, Cardiff, UK, 9–12 September 2019; pp. 1–16. [Google Scholar]
- Bianchi, F.M.; Grattarola, D.; Livi, L.; Alippi, C. Graph Neural Networks With Convolutional ARMA Filters. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3496–3507. [Google Scholar] [CrossRef]
- Wu, F.; Zhang, T.; de Souza, A.H.; Fifty, C.; Yu, T.; Weinberger, K.Q. Simplifying graph convolutional networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 10–15 June 2019; pp. 11884–11894. [Google Scholar]
- Kumar1, M.T.K. CNN-LSTM Hybrid model based human action recognition with skeletal representation using joint movements based energy maps. Int. J. Emerg. Trends Eng. Res. 2020, 8, 3502–3508. [Google Scholar] [CrossRef]
- Kishore, P.V.; Perera, D.G.; Kumar, M.T.K.; Kumar, D.A.; Kumar, E.K. A quad joint relational feature for 3D skeletal action recognition with circular CNNs. In Proceedings of the IEEE International Symposium on Circuits and Systems, Virtual, 10–21 October 2020. [Google Scholar] [CrossRef]
- Maddala, T.K.K.; Kishore, P.V.; Eepuri, K.K.; Dande, A.K. YogaNet: 3-D Yoga Asana Recognition Using Joint Angular Displacement Maps with ConvNets. IEEE Trans. Multimed. 2019, 21, 2492–2503. [Google Scholar] [CrossRef]
- Li, C.; Hou, Y.; Wang, P.; Li, W. Joint Distance Maps Based Action Recognition with Convolutional Neural Networks. IEEE Signal Process. Lett. 2017, 24, 624–628. [Google Scholar] [CrossRef]
Authors | Years | Models | Cross-Subject Acc (%) | Cross-View Acc (%) | Type of DLMs |
---|---|---|---|---|---|
Wen et al. [101] | 2019 | Motif-STGCN | 84.2 | 90.2 | CNN |
Song et al. [61] | 2019 | RA-GCN | 85.9 | 93.5 | GCN |
Li et al. [39] | 2019 | DenseIndRNN | 86.7 | 93.7 | RNN |
Li et al. [50] | 2019 | AS-GCN | 86.8 | 94.2 | GCN |
Si et al. [81] | 2019 | AGC-LSTM (Joint & Part) | 89.2 | 95 | Hybrid-DNN |
Lei et al. [102] | 2019 | 2s-AGCN | 88.5 | 95.1 | GCN |
Wu et al. [54] | 2019 | 2s-SDGCN | 89.6 | 95.7 | GCN |
Shi et al. [49] | 2019 | DGNN | 89.9 | 96.1 | GNN |
Peng et al. [48] | 2019 | GCN-NAS | 89.4 | 95.7 | GCN |
Gao et al. [52] | 2019 | BAGCN | 90.3 | 96.3 | GCN |
Li et al. [53] | 2019 | Sym-GNN | 90.1 | 96.4 | GNN |
Shi et al. [47] | 2019 | MS-AAGCN | 90 | 96.2 | GCN |
Shi et al. [47] | 2019 | JB-AAGCN | 89.4 | 96 | GCN |
Si et al. [81] | 2019 | AGC-LSTM | 89.2 | 95 | Hybrid-DNN |
Liang et al. [103] | 2019 | 3SCNN | 88.6 | 93.7 | CNN |
Shi et al. [55] | 2019 | 2s-NLGCN | 88.5 | 95.1 | GCN |
Cho et al. [104] | 2019 | TS-SAN | 87.2 | 92.7 | Hybrid-DNN |
Li et al. [105] | 2019 | RF-Action | 86.8 | 91.6 | Hybrid-DNN |
Song et al. [61] | 2019 | 3s RA-GCN | 85.9 | 93.5 | GCN |
Song et al. [61] | 2019 | 2s RA-GCN | 85.8 | 93 | GCN |
Papadopoulos et al. [56] | 2019 | GVFE + AS-GCN with DH-TCN | 85.3 | 92.8 | GCN |
Ding et al. [51] | 2019 | AR-GCN | 85.1 | 93.2 | GCN |
Wang et al. [106] | 2019 | ST-GCN-jpd | 83.36 | 88.84 | GCN |
Zhao et al. [82] | 2019 | Bayesian GC-LSTM | 81.8 | 89 | Hybrid-DNN |
Zhang et al. [107] | 2019 | EleAtt-GRU | 80.7 | 88.4 | RNN |
Caetano et al. [108] | 2019 | Skelemotion + Yang et al. | 76.5 | 84.7 | CNN |
Caetano et al. [109] | 2019 | TSRJI | 73.3 | 80.3 | CNN |
Zhang et al. [85] | 2020 | SGN | 89 | 94.5 | Hybrid-DNN |
Wang et al. [110] | 2020 | MV-IGNET | 89.2 | 96.3 | Hybrid-DNN |
Cheng et al. [60] | 2020 | 4s Shift-GCN | 90.7 | 96.5 | GCN |
Cheng et al. [111] | 2020 | DecoupleGCN-DropGraph | 90.8 | 96.6 | GCN |
Song et al. [59] | 2020 | PA-ResGCN-B19 | 90.9 | 96 | GCN |
Liu et al. [62] | 2020 | MS-G3D | 91.5 | 96.2 | GCN |
Koniusz et al. [45] | 2020 | SCK⊕ | 91.56 | 94.75 | CNN |
Shi et al. [84] | 2020 | DSTA-Net | 91.5 | 96.4 | Hybrid-DNN |
Ye et al. [63] | 2020 | Dynamic GCN | 91.5 | 96 | GCN |
Obinata et al. [64] | 2020 | MS-AAGCN + TEM | 91 | 96.5 | GCN |
Yang et al. [70] | 2020 | CGCN | 90.3 | 96.4 | GCN |
Yang et al. [75] | 2020 | FGCN-spatial + FGCN-motion | 90.2 | 96.3 | GCN |
Plizzaria et al. [86] | 2020 | ST-TR-agcn | 89.9 | 96.1 | Hybrid-DNN |
Peng et al. [48] | 2020 | Mix-Dimension | 89.7 | 96 | GCN |
Yang et al. [65] | 2020 | PGCN-TCA | 88 | 93.6 | GCN |
Song et al. [61] | 2020 | 3s RA-GCN | 87.3 | 93.6 | GCN |
Ding et al. [66] | 2020 | Sem-GCN | 86.2 | 94.2 | GCN |
Yu et al. [67] | 2020 | PeGCN | 85.6 | 93.4 | GCN |
Li et al. [68] | 2020 | PR-GCN | 85.2 | 91.7 | GCN |
Fan et al. [112] | 2020 | RGB+Skeleton | 84.23 | 89.27 | GCN |
Song et al. [74] | 2021 | EfficientGCN-B4 | 91.7 | 95.7 | GCN |
Chen et al. [71] | 2021 | CTR-GCN | 92.4 | 96.8 | GCN |
Chi et al. [79] | 2021 | InfoGCN | 93 | 97.1 | GCN |
Chen et al. [71] | 2021 | CTR-GCN | 92.4 | 96.8 | GCN |
Chen et al. [69] | 2021 | DualHead-Net | 92 | 96.6 | GCN |
Qin et al. [72] | 2021 | AngNet-JA + BA + JBA + VJBA | 91.7 | 96.4 | GCN/GNN |
Zeng et al. [73] | 2021 | Skeletal GNN | 91.6 | 96.7 | GNN |
Song et al. [74] | 2021 | EfficientGCN-B2 | 90.9 | 95.5 | GCN |
Song et al. [74] | 2021 | EfficientGCN-B0 | 89.9 | 94.7 | GCN |
Duan et al. [44] | 2022 | PoseC3D | 94.1 | 97.1 | CNN |
Trivedi et al. [88] | 2022 | PSUMNet | 92.9 | 96.7 | Hybrid-DNN |
Lee et al. [76] | 2022 | HD-GCN | 93.4 | 97.2 | GCN |
Duan et al. [44] | 2022 | DG-STGCN | 93.2 | 97.5 | GCN |
Xiang et al. [87] | 2022 | LST | 92.9 | 97 | Hybrid-DNN |
Hu et al. [77] | 2022 | STGAT | 92.8 | 97.3 | GCN |
Wang et al. [80] | 2022 | TCA-GCN | 92.8 | 97 | GCN |
Duan et al. [78] | 2022 | ST-GCN++ [PYSKL, 3D Skeleton] | 92.6 | 97.4 | GCN |
Duan et al. [78] | 2022 | ST-GCN [PYSKL, 2D Skeleton] | 92.4 | 98.3 | GCN |
Zhou et al. [89] | 2023 | Hyperformer | 92.9 | 96.5 | Hybrid-DNN |
Bavil et al. [90] | 2023 | Action Capsules | 90 | 96.3 | Hybrid-DNN |
Authors | Years | Models | Cross-Subject Acc (%) | Cross-Setting Acc (%) | Type of DLMs |
---|---|---|---|---|---|
Caetano et al. [108] | 2019 | SkeleMotion (Magnitude-Orientation) | 62.9 | 63 | CNN |
Caetano et al. [108] | 2019 | SkeleMotion + Yang et al | 67.7 | 66.9 | CNN |
Caetano et al. [109] | 2019 | TSRJI | 67.9 | 59.7 | CNN |
Song et al. [61] | 2019 | 3s RA-GCN | 81.10 | 82.70 | GCN |
Papadopoulos et al. [56] | 2019 | GVFE + AS-GCN with DH-TCN | 78.30 | 79.80 | GCN |
Liao et al. [40] | 2019 | Logsig-RNN | 68.30 | 67.20 | RNN |
Liu et al. [83] | 2019 | FSNet | 59.90 | 62.40 | Hybrid-DNN |
Zhang et al. [85] | 2020 | SGN | 79.2 | 81.5 | Hybrid-DNN |
Cheng et al. [60] | 2020 | 4s Shift-GCN | 85.9 | 87.6 | GCN |
Cheng et al. [111] | 2020 | DecoupleGCN-DropGraph | 86.5 | 88.1 | GCN |
Liu et al. [62] | 2020 | MS-G3D | 86.9 | 88.4 | GCN |
Song et al. [59] | 2020 | PA-ResGCN-B19 | 87.3 | ||
Shi et al. [84] | 2020 | DSTA-Net | 86.6 | 89.0 | Hybrid-DNN |
Yang et al. [75] | 2020 | FGCN-spatial + FGCN-motion | 85.4 | 87.4 | GCN |
Plizzaria et al. [86] | 2020 | ST-TR-agcn | 82.70 | 84.70 | Hybrid-DNN |
Peng et al. [48] | 2020 | Mix-Dimension | 80.50 | 83.20 | GCN |
Memme et al. [113] | 2020 | Gimme Signals | 70.80 | 71.60 | CNN |
Song et al. [74] | 2021 | EfficientGCN-B4 | 88.3 | 89.1 | GCN |
Chen et al. [71] | 2021 | CTR-GCN | 88.9 | 90.6 | GCN |
Chen et al. [71] | 2021 | InfoGCN | 89.8 | 91.2 | GCN |
Chen et al. [69] | 2021 | DualHead-Net | 88.2 | 89.3 | GCN |
Qin et al. [72] | 2021 | AngNet-JA + BA + JBA + VJBA | 88.2 | 89.2 | GCN/GNN |
Song et al. [74] | 2021 | EfficientGCN-B2 | 87.90 | 88.00 | GCN |
Zeng et al. [73] | 2021 | Skeletal GNN | 87.5 | 89.2 | GNN |
Song et al. [74] | 2021 | EfficientGCN-B0 | 85.90 | 84.30 | GCN |
Duan et al. [44] | 2022 | PoseC3D | 86.9 | 90.3 | CNN |
Trivedi et al. [88] | 2022 | PSUMNet | 89.4 | 90.6 | Hybrid-DNN |
Lee et al. [76] | 2022 | HD-GCN | 90.1 | 91.6 | GCN |
Xiang et al. [87] | 2022 | LST | 89.9 | 91.1 | Hybrid-DNN |
Duan et al. [44] | 2022 | DG-STGCN | 89.6 | 91.3 | GCN |
Wang et al. [80] | 2022 | TCA-GCN | 89.4 | 90.8 | GCN |
Hu et al. [77] | 2022 | STGAT | 88.7 | 90.4 | GCN |
Duan et al. [78] | 2022 | ST-GCN++ [PYSKL, 3D Skeleton] | 88.6 | 90.8 | GCN |
Zhou et al. [89] | 2023 | Hyperformer | 89.9 | 91.3 | Hybrid-DNN |
Models | FLOPs | Type of DLMs |
---|---|---|
TaCNN+ [114] | 1.0 | GCN/GNN |
ST-GCN [16] | 16.3 | GCN/GNN |
RA-GCN [61] | 32.8 | GCN/GNN |
2s-AGCN [102] | 37.3 | GCN/GNN |
PA-ResGCN [59] | 18.5 | GCN/GNN |
4s-ShiftGCN [60] | 10.0 | GCN/GNN |
DC-GCN+ADG [111] | 25.7 | GCN/GNN |
CTR-GCN [71] | 7.6 | GCN/GNN |
DSTA-Net [84] | 64.7 | Hyprid-DNN |
ST-TR [86] | 259.4 | Hyprid-DNN |
PSUMNet [88] | 2.7 | Hyprid-DNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Lei et al. [102] | 2019 | 2s-AGCN | 38.6 | GCN |
Shi et al. [47] | 2019 | MS-AAGCN | 37.8 | GCN |
Shi et al. [47] | 2019 | JB-AAGCN | 37.4 | GCN |
Peng et al. [48] | 2019 | GCN-NAS | 37.1 | GCN |
Shi et al. [49] | 2019 | DGNN | 36.9 | GNN |
Li et al. [50] | 2019 | AS-GCN | 34.8 | GCN |
Li et al. [115] | 2019 | ST-GR | 33.6 | GCN |
Ding et al. [51] | 2019 | AR-GCN | 33.5 | GCN |
Liu et al. [62] | 2020 | MS-G3D | 38 | GCN |
Ye et al. [63] | 2020 | Dynamic GCN | 37.9 | GCN |
Yang et al. [70] | 2020 | CGCN | 37.5 | GCN |
Plizzaria et al. [86] | 2020 | ST-TR-agcn | 37.4 | Hybrid-DNN |
Yu et al. [67] | 2020 | PeGCN | 34.8 | GCN |
Li et al. [68] | 2020 | PR-GCN | 33.7 | GCN |
Chen et al. [69] | 2021 | DualHead-Net | 38.4 | GCN |
Duan et al. [44] | 2022 | PoseC3D | 49.1 | CNN |
Hachiuma et al. [116] | 2023 | Structured Keypoint Pooling | 52.3 | CNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Zhang et al. [107] | 2019 | EleAtt-GRU | 90.7 | RNN |
Davoodikakhki et al. [117] | 2020 | Hierarchical Action Classification | 93.99 | CNN |
Zhang et al. [85] | 2020 | SGN | 92.5 | Hybrid-DNN |
Chi et al. [79] | 2021 | InfoGCN | 97 | GCN |
Chen et al. [71] | 2021 | CTR-GCN | 96.5 | GCN |
Xiang et al. [87] | 2022 | LST | 97.2 | Hybrid-DNN |
Lee et al. [76] | 2022 | HD-GCN | 97.2 | GCN |
Wang et al. [80] | 2022 | TCA-GCN | 97 | GCN |
Bavil et al. [90] | 2023 | Action Capsules | 97.3 | Hybrid-DNN |
Authors | Years | Models | Activity Recognition Accuracy (%) | Type of DLMs |
---|---|---|---|---|
Yan et al. [118] | 2019 | PA3D+RPAN | 86.1 | CNN |
Nally et al. [119] | 2019 | STAR-Net | 64.3 | CNN |
Yang et al. [120] | 2019 | DD-Net | 77.2 | Hybrid-DNN |
Ludl et al. [121] | 2019 | EHPI | 65.5 | CNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Zhang et al. [107] | 2019 | EleAtt-GRU | 85.7 | RNN |
Ke et al. [122] | 2019 | Local + LGN | 83.14 | Hybrid-DNN |
Zhang et al. [85] | 2020 | SGN | 86.9 | Hybrid-DNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Kao et al. [57] | 2019 | GFT | 96.0 | GCN |
Paoletti et al. [124] | 2020 | Temporal Subspace Clustering | 99.5 | Hybrid-DNN |
Koniusz et al. [45] | 2020 | SCK ⊕ DCK | 99.2 | CNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Koniusz et al. [45] | 2020 | SCK ⊕ + DCK⊕ | 97.45 | CNN |
Paoletti et al. [124] | 2020 | Temporal Spectral Clustering + Temporal Subspace Clustering | 95.81 | Hybrid-DNN |
Koniusz et al. [45] | 2020 | SCK + DCK | 95.23 | CNN |
Authors | Years | Models | Activity Recognition Acc (%) | Type of DLMs |
---|---|---|---|---|
Mazari et al. [125] | 2019 | MLGCN | 98.6 | GCN |
Bianchi et al. [126] | 2019 | ArmaConv | 96 | GCN |
WuFelix et al. [127] | 2019 | SGCConv | 94 | GCN |
Datasets | Configurations | Methods | ||||
---|---|---|---|---|---|---|
DDnet [120] Acc (%) | PA-ResGCN [59] Acc (%) | CNN-LSTM [128] Acc (%) | SgCNN [14] Acc (%) | CCNN [129] Acc (%) | ||
KLHA3D-102 | KLHA3D-102_Conf. 1 | 52.94 | 40.02 | 92.63 (Cross-Subject) | 93.82 | 98.12 (Cross-Subject) |
KLHA3D-102_Conf. 2 | 45.18 | 52.94 | 92.46 (Cross-View) | - | 96.15 (Cross-View) | |
KLHA3D-102_Conf. 3 | 52.94 | 48.04 | - | - | - | |
KLHA3D-102_Conf. 4 | 2.55 | 10.22 | - | - | - | |
KLHA3D-102_Conf. 5 | 1.96 | 8.56 | - | - | - | |
KLYoga3D | KLYOGA3D_Conf. 1 | 20.51 | 33.33 | - | 95.48 | - |
KLYOGA3D_Conf. 2 | 25.64 | 53.85 | - | - | - |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nguyen, H.-C.; Nguyen, T.-H.; Scherer, R.; Le, V.-H. Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study. Sensors 2023, 23, 5121. https://doi.org/10.3390/s23115121
Nguyen H-C, Nguyen T-H, Scherer R, Le V-H. Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study. Sensors. 2023; 23(11):5121. https://doi.org/10.3390/s23115121
Chicago/Turabian StyleNguyen, Hung-Cuong, Thi-Hao Nguyen, Rafał Scherer, and Van-Hung Le. 2023. "Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study" Sensors 23, no. 11: 5121. https://doi.org/10.3390/s23115121
APA StyleNguyen, H. -C., Nguyen, T. -H., Scherer, R., & Le, V. -H. (2023). Deep Learning for Human Activity Recognition on 3D Human Skeleton: Survey and Comparative Study. Sensors, 23(11), 5121. https://doi.org/10.3390/s23115121