Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Qing Lei ORCID: orcid.org/0000-0003-3573-4226^1,2,
Huiying Li¹,
Hongbo Zhang^1,2,
Jixiang Du^1,3 &
…
Shangce Gao⁴

565 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

Article 21 April 2022

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data generated and/or analyzed during the current study is available from the first author or the corresponding author on reasonable request.

References

Liao Y, Vakanski A, Xian M (2020) A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans Neural Syst Rehab Eng 28(2):468–477. https://doi.org/10.1109/TNSRE.2020.2966249
Article Google Scholar
Lee MH, Siewiorek DP, Smailagic A, Bernardino A, Badia SBi (2019) Learning to assess the quality of stroke rehabilitation exercises. In: Proceedings of the 24th international conference on intelligent user interfaces. IUI ’19, Association for Computing Machinery, pp 218–228. https://doi.org/10.1145/3301275.3302273
Tang D (2020) Hybridized hierarchical deep convolutional neural network for sports rehabilitation exercises. IEEE Access 8:118969–118977. https://doi.org/10.1109/ACCESS.2020.3005189
Article Google Scholar
Du C, Graham S, Depp C, Nguyen T (2021) Assessing physical rehabilitation exercises using graph convolutional network with self-supervised regularization. In: 2021 43rd annual international conference of the IEEE engineering in medicine biology society (EMBC), pp 281–285. https://doi.org/10.1109/EMBC46164.2021.9629569
Dong L-J, Zhang H-B, Shi Q, Lei Q, Du J-X, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowledge-Based Syst 229:107388. https://doi.org/10.1016/j.knosys.2021.107388
Article Google Scholar
Lei Q, Zhang H, Du J (2021) Temporal attention learning for action quality assessment in sports video. Signal, Image Video Process 1575–1583
Li Y, Chai X, Chen X (2019) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: Jawahar CV, Li H, Mori G, Schindler K (eds) Computer vision - ACCV 2018. Springer, Cham, pp 149–164
Chapter Google Scholar
Parmar P, Morris B (2019) Action quality assessment across multiple actions. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 1468–1476 https://doi.org/10.1109/WACV.2019.00161
Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7854–7863. https://doi.org/10.1109/CVPR.2019.00805
Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6057–6066. https://doi.org/10.1109/CVPR.2018.00634
Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. International Journal of Computer Assisted Radiology and Surgery 13(3):443–455
Article Google Scholar
Zia A, Sharma Y, Bettadapura V, Sarin EL, Ploetz T, Clements MA, Essa I (2016) Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int J Comput Assisted Radio Surg 11:1623–1636
Article Google Scholar
Tang Y, Ni Z, Zhou J, Zhang D, Lu J, Wu Y, Zhou J (2020) Uncertainty-aware score distribution learning for action quality assessment. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9836–9845. https://doi.org/10.1109/CVPR42600.2020.00986
Liu S, Zhang A, Li Y, Zhou J, Xu L, Dong Z, Zhang R (2021) Temporal segmentation of fine-grained semantic action: A motion-centered figure skating dataset. AAAI conference on artificial intelligence 35:2163–2171
Article Google Scholar
Parmar P, Morris BT (2017) Learning to score olympic events. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 76–84. https://doi.org/10.1109/CVPRW.2017.16
Elkholy A, Hussein ME, Gomaa W, Damen D, Saba E (2020) Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance. IEEE J Biomed Health Inform 24(1):280–291. https://doi.org/10.1109/JBHI.2019.2904321
Article Google Scholar
Hakim T, Shimshoni I (2019) A-mal: Automatic motion assessment learning from properly performed motions in 3d skeleton videos. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1589–1598
Zeng L-A, Hong, F-T, Zheng W-S, Yu Q-Z, Zeng W, Wang Y-W, Lai, J-H (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos In: Proceedings of the 28th ACM international conference on multimedia, pp. 2526–2534
Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P-A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 214–221
Forestier G, Petitjean F, Senin P, Despinoy F, Jannin P (2017) Discovering discriminative and interpretable patterns for surgical motion analysis. In: Conference on artificial intelligence in medicine in Europe, pp 136–145
Li Z, Huang Y, Cai M, Sato Y (2019) Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 4385–4395
Xu C, Fu Y, Zhang B, Chen Z, Jiang Y-G, Xue X (2019) Learning to score figure skating sport videos. IEEE Trans Circuits Syst Video Technol 30(12):4578–4590
Article Google Scholar
Parmar P, Morris BT (2016) Measuring the quality of exercises. In: 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 2241–2244
Li Y, Chai X, Chen X (2018) End-to-end learning for action quality assessment. In: Advances in multimedia information processing – PCM 2018, Springer, pp 125–134 https://doi.org/10.1007/978-3-030-00767-6_12
Gao J, Zheng W-S, Pan J-H, Gao C, Wang Y, Zeng W, Lai J (2020) An asymmetric modeling for action assessment. In: European conference on computer vision, Springer, pp 222–238
Wang S, Yang D, Zhai P, Chen C, Zhang L (2021) Tsa-net: Tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM international conference on multimedia, pp 4902–4910
Jain H, Harit G, Sharma A (2020) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Systems Video Technol 31(6):2260–2273
Article Google Scholar
Yu X, Rao Y, Zhao W, Lu J, Zhou J (2021) Group-aware contrastive regression for action quality assessment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7919–7928
Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: European conference on computer vision, Springer, pp 556–571
Bruce X, Liu Y, Chan KC (2020) Skeleton-based detection of abnormalities in human actions using graph convolutional networks. In: 2020 Second international conference on transdisciplinary AI (TransAI), IEEE, pp 131–137
Bruce X, Liu Y, Chan KC, Yang Q, Wang X (2021) Skeleton-based human action evaluation using graph convolutional network for monitoring alzheimer’s progression. Pattern Recogn 119:108095
Article Google Scholar
Pan J-H, Gao J, Zheng W-S (2019) Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6331–6340
Pan J-H, Gao J, Zheng W-S (2021) Adaptive action assessment. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3126534
Article Google Scholar
Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vision Image Understanding 208–209:103219. https://doi.org/10.1016/j.cviu.2021.103219
Nekoui M, Cruz FOT, Cheng L (2020) Falcons: Fast learner-grader for contorted poses in sports. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 900–901
Nekoui M, Cruz FOT, Cheng L (2021) Eagle-eye: Extreme-pose action grader using detail bird’s-eye view. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 394–402
Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452
Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, Lin L (2019) Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28:4870–4882
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported the National Nature Science Foundation of China (No. 62001176, 61871196); the Natural Science Foundation of Fujian Province, China (No. 2020J01085, 2019J01082); the National Key Research and Development Program of China No. 2019YFC1604700, the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University No. ZQN-YX601, and the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Huaqiao University, 361000, Xiamen, Fujian, China
Qing Lei, Huiying Li, Hongbo Zhang & Jixiang Du
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, 361000, Xiamen, Fujian, China
Qing Lei & Hongbo Zhang
Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, 361000, Xiamen, Fujian, China
Jixiang Du
Faculty of Engineering, University of Toyama, 930-8555, Toyama-shi, Japan
Shangce Gao

Authors

Qing Lei
View author publications
You can also search for this author in PubMed Google Scholar
Huiying Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongbo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jixiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Shangce Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shangce Gao.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lei, Q., Li, H., Zhang, H. et al. Multi-skeleton structures graph convolutional network for action quality assessment in long videos. Appl Intell 53, 21692–21705 (2023). https://doi.org/10.1007/s10489-023-04613-5

Download citation

Accepted: 05 April 2023
Published: 09 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04613-5

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Toward action recognition and assessment using SFAGCN and combinative regression model of spatiotemporal features

3D-Yoga: A 3D Yoga Dataset for Visual-Based Hierarchical Sports Action Analysis

Skeleton-Based Action Recognition with Dense Spatial Temporal Graph Network

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation