research-article

TSA-Net: Tube Self-Attention Network for Action Quality Assessment

Authors:

Lihua ZhangAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4902 - 4910

https://doi.org/10.1145/3474085.3475438

Published: 17 October 2021 Publication History

Get Access

Abstract

In recent years, assessing action quality from videos has attracted growing attention in computer vision community and human-computer interaction. Most existing approaches usually tackle this problem by directly migrating the model from action recognition tasks, which ignores the intrinsic differences within the feature map such as foreground and background information. To address this issue, we propose a Tube Self-Attention Network (TSA-Net) for action quality assessment (AQA). Specifically, we introduce a single object tracker into AQA and propose the Tube Self-Attention Module (TSA), which can efficiently generate rich spatio-temporal contextual information by adopting sparse feature interactions. The TSA module is embedded in existing video networks to form TSA-Net. Overall, our TSA-Net is with the following merits: 1) High computational efficiency, 2) High flexibility, and 3) The state-of-the-art performance. Extensive experiments are conducted on popular action quality assessment datasets including AQA-7 and MTL-AQA. Besides, a dataset named Fall Recognition in Figure Skating (FR-FS) is proposed to explore the basic action assessment in the figure skating scene. Our TSA-Net achieves the Spearman's Rank Correlation of 0.8476 and 0.9393 on AQA-7 and MTL-AQA, respectively, which are the new state-of-the-art results. The results on FR-FS also verify the effectiveness of the TSA-Net. The code and FR-FS dataset are publicly available at https://github.com/Shunli-Wang/TSA-Net.

Supplementary Material

ZIP File (mfp1510aux.zip)

In this supplementary material, we analyze the computational complexity of TSA-Net in detail, and show more visualization cases.

Download
1.38 MB

MP4 File (Presentation video of TSA-Net.mp4)

Presentation video of the Tube Self-Attention Network (TSA-Net).

Download
52.80 MB

References

[1]

Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4724--4733.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

A Figure Skating Jumping Dataset for Replay-Guided Action Quality Assessment

Hybrid Dynamic-static Context-aware Attention Network for Action Assessment in Long Videos

Joint long and short span self-attention network for multi-view classification

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations