Abstract
In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The data generated and/or analyzed during the current study is available from the first author or the corresponding author on reasonable request.
References
Liao Y, Vakanski A, Xian M (2020) A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans Neural Syst Rehab Eng 28(2):468–477. https://doi.org/10.1109/TNSRE.2020.2966249
Lee MH, Siewiorek DP, Smailagic A, Bernardino A, Badia SBi (2019) Learning to assess the quality of stroke rehabilitation exercises. In: Proceedings of the 24th international conference on intelligent user interfaces. IUI ’19, Association for Computing Machinery, pp 218–228. https://doi.org/10.1145/3301275.3302273
Tang D (2020) Hybridized hierarchical deep convolutional neural network for sports rehabilitation exercises. IEEE Access 8:118969–118977. https://doi.org/10.1109/ACCESS.2020.3005189
Du C, Graham S, Depp C, Nguyen T (2021) Assessing physical rehabilitation exercises using graph convolutional network with self-supervised regularization. In: 2021 43rd annual international conference of the IEEE engineering in medicine biology society (EMBC), pp 281–285. https://doi.org/10.1109/EMBC46164.2021.9629569
Dong L-J, Zhang H-B, Shi Q, Lei Q, Du J-X, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowledge-Based Syst 229:107388. https://doi.org/10.1016/j.knosys.2021.107388
Lei Q, Zhang H, Du J (2021) Temporal attention learning for action quality assessment in sports video. Signal, Image Video Process 1575–1583
Li Y, Chai X, Chen X (2019) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: Jawahar CV, Li H, Mori G, Schindler K (eds) Computer vision - ACCV 2018. Springer, Cham, pp 149–164
Parmar P, Morris B (2019) Action quality assessment across multiple actions. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 1468–1476 https://doi.org/10.1109/WACV.2019.00161
Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7854–7863. https://doi.org/10.1109/CVPR.2019.00805
Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6057–6066. https://doi.org/10.1109/CVPR.2018.00634
Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. International Journal of Computer Assisted Radiology and Surgery 13(3):443–455
Zia A, Sharma Y, Bettadapura V, Sarin EL, Ploetz T, Clements MA, Essa I (2016) Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int J Comput Assisted Radio Surg 11:1623–1636
Tang Y, Ni Z, Zhou J, Zhang D, Lu J, Wu Y, Zhou J (2020) Uncertainty-aware score distribution learning for action quality assessment. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9836–9845. https://doi.org/10.1109/CVPR42600.2020.00986
Liu S, Zhang A, Li Y, Zhou J, Xu L, Dong Z, Zhang R (2021) Temporal segmentation of fine-grained semantic action: A motion-centered figure skating dataset. AAAI conference on artificial intelligence 35:2163–2171
Parmar P, Morris BT (2017) Learning to score olympic events. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 76–84. https://doi.org/10.1109/CVPRW.2017.16
Elkholy A, Hussein ME, Gomaa W, Damen D, Saba E (2020) Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance. IEEE J Biomed Health Inform 24(1):280–291. https://doi.org/10.1109/JBHI.2019.2904321
Hakim T, Shimshoni I (2019) A-mal: Automatic motion assessment learning from properly performed motions in 3d skeleton videos. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1589–1598
Zeng L-A, Hong, F-T, Zheng W-S, Yu Q-Z, Zeng W, Wang Y-W, Lai, J-H (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos In: Proceedings of the 28th ACM international conference on multimedia, pp. 2526–2534
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P-A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 214–221
Forestier G, Petitjean F, Senin P, Despinoy F, Jannin P (2017) Discovering discriminative and interpretable patterns for surgical motion analysis. In: Conference on artificial intelligence in medicine in Europe, pp 136–145
Li Z, Huang Y, Cai M, Sato Y (2019) Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 4385–4395
Xu C, Fu Y, Zhang B, Chen Z, Jiang Y-G, Xue X (2019) Learning to score figure skating sport videos. IEEE Trans Circuits Syst Video Technol 30(12):4578–4590
Parmar P, Morris BT (2016) Measuring the quality of exercises. In: 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 2241–2244
Li Y, Chai X, Chen X (2018) End-to-end learning for action quality assessment. In: Advances in multimedia information processing – PCM 2018, Springer, pp 125–134 https://doi.org/10.1007/978-3-030-00767-6_12
Gao J, Zheng W-S, Pan J-H, Gao C, Wang Y, Zeng W, Lai J (2020) An asymmetric modeling for action assessment. In: European conference on computer vision, Springer, pp 222–238
Wang S, Yang D, Zhai P, Chen C, Zhang L (2021) Tsa-net: Tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM international conference on multimedia, pp 4902–4910
Jain H, Harit G, Sharma A (2020) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Systems Video Technol 31(6):2260–2273
Yu X, Rao Y, Zhao W, Lu J, Zhou J (2021) Group-aware contrastive regression for action quality assessment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7919–7928
Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: European conference on computer vision, Springer, pp 556–571
Bruce X, Liu Y, Chan KC (2020) Skeleton-based detection of abnormalities in human actions using graph convolutional networks. In: 2020 Second international conference on transdisciplinary AI (TransAI), IEEE, pp 131–137
Bruce X, Liu Y, Chan KC, Yang Q, Wang X (2021) Skeleton-based human action evaluation using graph convolutional network for monitoring alzheimer’s progression. Pattern Recogn 119:108095
Pan J-H, Gao J, Zheng W-S (2019) Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6331–6340
Pan J-H, Gao J, Zheng W-S (2021) Adaptive action assessment. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3126534
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vision Image Understanding 208–209:103219. https://doi.org/10.1016/j.cviu.2021.103219
Nekoui M, Cruz FOT, Cheng L (2020) Falcons: Fast learner-grader for contorted poses in sports. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 900–901
Nekoui M, Cruz FOT, Cheng L (2021) Eagle-eye: Extreme-pose action grader using detail bird’s-eye view. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 394–402
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452
Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, Lin L (2019) Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28:4870–4882
Acknowledgements
This work was supported the National Nature Science Foundation of China (No. 62001176, 61871196); the Natural Science Foundation of Fujian Province, China (No. 2020J01085, 2019J01082); the National Key Research and Development Program of China No. 2019YFC1604700, the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University No. ZQN-YX601, and the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lei, Q., Li, H., Zhang, H. et al. Multi-skeleton structures graph convolutional network for action quality assessment in long videos. Appl Intell 53, 21692–21705 (2023). https://doi.org/10.1007/s10489-023-04613-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04613-5