[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Multi-skeleton structures graph convolutional network for action quality assessment in long videos

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In most existing action quality assessment (AQA) methods, how to score simple actions in short-term sport videos has been widely explored. Recently, a few studies have attempted to solve the AQA problem of long-duration activity by extracting dynamic or static information directly from RGB video. However, these methods may ignore specific postures defined by dynamic changes in human body joints, which makes the results inaccurate and unexplainable. In this work, we propose a novel graph convolution network based on multiple skeleton structure modelling to address the problem of effective pose feature learning to improve the performance of AQA in complex activity. Specifically, three kinds of skeleton structures, including the joints’ self-connection, the intra-part connection, and the inter-part connection, are defined to model the motion patterns of joints and body parts. Moreover, a temporal attention learning module is designed to extract temporal relations between skeleton subsequences. We evaluate the proposed method on two benchmark datasets, the MIT-skate dataset and the Rhythmic Gymnastics dataset. Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results show that our method achieves state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The data generated and/or analyzed during the current study is available from the first author or the corresponding author on reasonable request.

References

  1. Liao Y, Vakanski A, Xian M (2020) A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans Neural Syst Rehab Eng 28(2):468–477. https://doi.org/10.1109/TNSRE.2020.2966249

    Article  Google Scholar 

  2. Lee MH, Siewiorek DP, Smailagic A, Bernardino A, Badia SBi (2019) Learning to assess the quality of stroke rehabilitation exercises. In: Proceedings of the 24th international conference on intelligent user interfaces. IUI ’19, Association for Computing Machinery, pp 218–228. https://doi.org/10.1145/3301275.3302273

  3. Tang D (2020) Hybridized hierarchical deep convolutional neural network for sports rehabilitation exercises. IEEE Access 8:118969–118977. https://doi.org/10.1109/ACCESS.2020.3005189

    Article  Google Scholar 

  4. Du C, Graham S, Depp C, Nguyen T (2021) Assessing physical rehabilitation exercises using graph convolutional network with self-supervised regularization. In: 2021 43rd annual international conference of the IEEE engineering in medicine biology society (EMBC), pp 281–285. https://doi.org/10.1109/EMBC46164.2021.9629569

  5. Dong L-J, Zhang H-B, Shi Q, Lei Q, Du J-X, Gao S (2021) Learning and fusing multiple hidden substages for action quality assessment. Knowledge-Based Syst 229:107388. https://doi.org/10.1016/j.knosys.2021.107388

    Article  Google Scholar 

  6. Lei Q, Zhang H, Du J (2021) Temporal attention learning for action quality assessment in sports video. Signal, Image Video Process 1575–1583

  7. Li Y, Chai X, Chen X (2019) Scoringnet: Learning key fragment for action quality assessment with ranking loss in skilled sports. In: Jawahar CV, Li H, Mori G, Schindler K (eds) Computer vision - ACCV 2018. Springer, Cham, pp 149–164

    Chapter  Google Scholar 

  8. Parmar P, Morris B (2019) Action quality assessment across multiple actions. In: 2019 IEEE winter conference on applications of computer vision (WACV), pp 1468–1476 https://doi.org/10.1109/WACV.2019.00161

  9. Doughty H, Mayol-Cuevas W, Damen D (2019) The pros and cons: Rank-aware temporal attention for skill determination in long videos. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 7854–7863. https://doi.org/10.1109/CVPR.2019.00805

  10. Doughty H, Damen D, Mayol-Cuevas W (2018) Who’s better? who’s best? pairwise deep ranking for skill determination. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 6057–6066. https://doi.org/10.1109/CVPR.2018.00634

  11. Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. International Journal of Computer Assisted Radiology and Surgery 13(3):443–455

    Article  Google Scholar 

  12. Zia A, Sharma Y, Bettadapura V, Sarin EL, Ploetz T, Clements MA, Essa I (2016) Automated video-based assessment of surgical skills for training and evaluation in medical schools. Int J Comput Assisted Radio Surg 11:1623–1636

    Article  Google Scholar 

  13. Tang Y, Ni Z, Zhou J, Zhang D, Lu J, Wu Y, Zhou J (2020) Uncertainty-aware score distribution learning for action quality assessment. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9836–9845. https://doi.org/10.1109/CVPR42600.2020.00986

  14. Liu S, Zhang A, Li Y, Zhou J, Xu L, Dong Z, Zhang R (2021) Temporal segmentation of fine-grained semantic action: A motion-centered figure skating dataset. AAAI conference on artificial intelligence 35:2163–2171

    Article  Google Scholar 

  15. Parmar P, Morris BT (2017) Learning to score olympic events. In: 2017 IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp 76–84. https://doi.org/10.1109/CVPRW.2017.16

  16. Elkholy A, Hussein ME, Gomaa W, Damen D, Saba E (2020) Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance. IEEE J Biomed Health Inform 24(1):280–291. https://doi.org/10.1109/JBHI.2019.2904321

    Article  Google Scholar 

  17. Hakim T, Shimshoni I (2019) A-mal: Automatic motion assessment learning from properly performed motions in 3d skeleton videos. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 1589–1598

  18. Zeng L-A, Hong, F-T, Zheng W-S, Yu Q-Z, Zeng W, Wang Y-W, Lai, J-H (2020) Hybrid dynamic-static context-aware attention network for action assessment in long videos In: Proceedings of the 28th ACM international conference on multimedia, pp. 2526–2534

  19. Ismail Fawaz H, Forestier G, Weber J, Idoumghar L, Muller P-A (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. In: International conference on medical image computing and computer-assisted intervention, pp 214–221

  20. Forestier G, Petitjean F, Senin P, Despinoy F, Jannin P (2017) Discovering discriminative and interpretable patterns for surgical motion analysis. In: Conference on artificial intelligence in medicine in Europe, pp 136–145

  21. Li Z, Huang Y, Cai M, Sato Y (2019) Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF international conference on computer vision workshops, pp 4385–4395

  22. Xu C, Fu Y, Zhang B, Chen Z, Jiang Y-G, Xue X (2019) Learning to score figure skating sport videos. IEEE Trans Circuits Syst Video Technol 30(12):4578–4590

    Article  Google Scholar 

  23. Parmar P, Morris BT (2016) Measuring the quality of exercises. In: 2016 38th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 2241–2244

  24. Li Y, Chai X, Chen X (2018) End-to-end learning for action quality assessment. In: Advances in multimedia information processing – PCM 2018, Springer, pp 125–134 https://doi.org/10.1007/978-3-030-00767-6_12

  25. Gao J, Zheng W-S, Pan J-H, Gao C, Wang Y, Zeng W, Lai J (2020) An asymmetric modeling for action assessment. In: European conference on computer vision, Springer, pp 222–238

  26. Wang S, Yang D, Zhai P, Chen C, Zhang L (2021) Tsa-net: Tube self-attention network for action quality assessment. In: Proceedings of the 29th ACM international conference on multimedia, pp 4902–4910

  27. Jain H, Harit G, Sharma A (2020) Action quality assessment using siamese network-based deep metric learning. IEEE Trans Circuits Systems Video Technol 31(6):2260–2273

    Article  Google Scholar 

  28. Yu X, Rao Y, Zhao W, Lu J, Zhou J (2021) Group-aware contrastive regression for action quality assessment. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7919–7928

  29. Pirsiavash H, Vondrick C, Torralba A (2014) Assessing the quality of actions. In: European conference on computer vision, Springer, pp 556–571

  30. Bruce X, Liu Y, Chan KC (2020) Skeleton-based detection of abnormalities in human actions using graph convolutional networks. In: 2020 Second international conference on transdisciplinary AI (TransAI), IEEE, pp 131–137

  31. Bruce X, Liu Y, Chan KC, Yang Q, Wang X (2021) Skeleton-based human action evaluation using graph convolutional network for monitoring alzheimer’s progression. Pattern Recogn 119:108095

    Article  Google Scholar 

  32. Pan J-H, Gao J, Zheng W-S (2019) Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6331–6340

  33. Pan J-H, Gao J, Zheng W-S (2021) Adaptive action assessment. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3126534

    Article  Google Scholar 

  34. Plizzari C, Cannici M, Matteucci M (2021) Skeleton-based action recognition via spatial and temporal transformer networks. Comput Vision Image Understanding 208–209:103219. https://doi.org/10.1016/j.cviu.2021.103219

  35. Nekoui M, Cruz FOT, Cheng L (2020) Falcons: Fast learner-grader for contorted poses in sports. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 900–901

  36. Nekoui M, Cruz FOT, Cheng L (2021) Eagle-eye: Extreme-pose action grader using detail bird’s-eye view. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 394–402

  37. Cao Z, Simon T, Wei S-E, Sheikh Y (2017) Realtime multi-person 2d pose estimation using part affinity fields. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7291–7299

  38. Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-second AAAI conference on artificial intelligence, pp 7444–7452

  39. Zhang R, Li J, Sun H, Ge Y, Luo P, Wang X, Lin L (2019) Scan: Self-and-collaborative attention network for video person re-identification. IEEE Trans Image Process 28:4870–4882

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This work was supported the National Nature Science Foundation of China (No. 62001176, 61871196); the Natural Science Foundation of Fujian Province, China (No. 2020J01085, 2019J01082); the National Key Research and Development Program of China No. 2019YFC1604700, the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University No. ZQN-YX601, and the Japan Society for the Promotion of Science (JSPS) KAKENHI under Grant JP22H03643.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shangce Gao.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lei, Q., Li, H., Zhang, H. et al. Multi-skeleton structures graph convolutional network for action quality assessment in long videos. Appl Intell 53, 21692–21705 (2023). https://doi.org/10.1007/s10489-023-04613-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04613-5

Keywords

Navigation