Abstract
Nowadays, scoring athletes’ performance in skilled sports automatically has drawn more and more attention from the academic community. However, extracting effective features and predicting reasonable scores for a long skilled sport video still beset researchers. In this paper, we introduce the ScoringNet, a novel network consisting of key fragment segmentation (KFS) and score prediction (SP), to address these two problems. To get the effective features, we design KFS to obtain key fragments and remove irrelevant fragments by semantic video segmentation. Then a 3D convolutional neural network extracts features from each key fragment. In score prediction, we fuse the ranking loss into the traditional loss function to make the predictions more reasonable in terms of both the score value and the ranking aspects. Through the deep learning, we narrow the gap between the predictions and ground-truth scores as well as making the predictions satisfy the ranking constraint. Widely experiments convincingly show that our method achieves the state-of-the-art results on three datasets.
This work was partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61472398, 61532018.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
List of Olympic Games Scandals and Controversies. https://en.wikipedia.org/wiki/List_of_Olympic_Games_boycotts. Accessed 26 Mar 2018
Vault. https://en.wikipedia.org/wiki/Vault_(gymnastics). 2.1.2. Accessed 4 June 2018
FINA Diving Rules. http://www.fina.org/sites/default/files/2017-2021_diving_16032018.pdf. D8.1.3. Accessed 12 Sept 2017
FINA Diving Rules. http://www.fina.org/sites/default/files/2017-2021_diving_16032018.pdf. APPENDIX 4. Accessed 12 Sept 2017
Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36
Tao, L., et al.: A comparative study of pose representation and dynamics modelling for online motion quality assessment. Comput. Vis. Image Underst. 148, 136–152 (2016)
Parmar, P., Morris, B.: Human motion assessment in real time using recurrent self-organization. In: 25th IEEE International Symposium on Robot and Human Interactive Communication, New York, USA, pp. 71–76 (2016)
Parisi, G., Magg, S., Wermter, S.: Measuring the quality of exercises. In: 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Florida, USA, pp. 2241–2244 (2016)
Parmar, P., Morris, B.: Learning to score olympic events. In: 30th IEEE Conference on Computer Vision and Pattern Recognition Work Shop, pp. 76–84. IEEE, Hawaii (2017)
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Clements, M.A., Essa, I.: Automated assessment of surgical skills using frequency analysis. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 430–438. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24553-9_53
Baptista, R., Antunes, M., Aouada, D., Ottersten, B.: Video-based feedback for assisting physical activity. In: 12th International Conference on Computer Vision Theory and Applications, Porto, Portugal, pp. 430–438 (2017)
Carvajal, J., Wiliem, A., Sanderson, C., Lovell, B.: Towards Miss Universe automatic prediction: the evening gown competition. In: 23rd International Conference on Pattern Recognition, pp. 1089–1094. IEEE, Cancun (2016)
Du, T., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: International Conference on Computer Vision, pp. 4489–4497. IEEE, Santiago (2015)
Venkataraman, V., Vlachos, I., Turaga, P.: Dynamical regularity for action analysis. In: 26th British Machine Vision Conference, pp. 67.1–67.12. British Machine Vision Association, Swansea (2015)
Soomro, K., Zamir, A., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
Chai, X., Liu, Z., Li, Y., Yin, F., Chen, X.: SignInstructor: an effective tool for sign language vocabulary learning. In: 4th Asian Conference on Pattern Recognition, Nanjing, China (2017)
Le, Q., Zou, W., Yeung, S., Ng, A.: Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: 24th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3361–3368. IEEE, Colorado Springs (2011)
Kingma, D.,Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv preprint arXiv:1412.6980 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Szegedya, C., et al.: Going deeper with convolutions. In: 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE, Boston (2015)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv preprint arXiv:1508.01991 (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 28th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE, Boston (2015)
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 4724–4733. IEEE, Hawaii (2017)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: International Conference on Computer Vision, pp. 5534–5542. IEEE, Venice (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: 21st Conference on Neural Information Processing Systems, pp. 568–576. MIT Press, Montreal (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: 19th Conference on Neural Information Processing Systems, pp. 1097–1105. MIT Press, Lake Tahoe (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Laptev, I., Lindeberg, T.: On space-time interest points. Int. J. Comput. Vis. 64, 107–123 (2005)
Huang, G., Liu, Z., Laurens, V.D.M., Weinberger, K.Q.: Densely connected convolutional networks. In: 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261–2269. IEEE, Hawaii (2017)
Doughty, H., Damen, D., Mayol-Cuevas, W.: Who’s better? Who’s best? Pairwise deep ranking for skill determination. In: 31st IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City (2018)
Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3D: stacking segmental P3D for action quality assessment. In: IEEE International Conference on Image Processing. IEEE, Athens (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Chai, X., Chen, X. (2019). ScoringNet: Learning Key Fragment for Action Quality Assessment with Ranking Loss in Skilled Sports. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11366. Springer, Cham. https://doi.org/10.1007/978-3-030-20876-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-20876-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20875-2
Online ISBN: 978-3-030-20876-9
eBook Packages: Computer ScienceComputer Science (R0)