[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3444685.3446296acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

RICAPS: residual inception and cascaded capsule network for broadcast sports video classification

Published: 03 May 2021 Publication History

Abstract

The field of broadcast sports video analysis requires attention from the research community. Identifying the semantic actions within a broadcast sports video aids better video analysis and highlight generation. One of the key challenges posed to sports video analysis is the availability of relevant datasets. In this paper, we introduce a new dataset SP-2 related to broadcast sports video (available at https://github.com/abdkhanstd/Sports2). SP-2 is a large dataset with several annotations such as sports category (class), playfield scenario, and game action. Along with the introduction of this dataset, we focus on accurately classifying the broadcast sports video category and propose a simple yet elegant method for the classification of broadcast sports video. Broadcast sports video classification plays an important role in sports video analysis as different sports games follow a different set of rules and situations. Our method exploits and explores the true potential of capsule network with dynamic routing, which was introduced recently. First, we extract features using a residual convolutional neural network and build temporal feature sequences. Further, a cascaded capsule network is trained using the extracted feature sequence. Residual inception cascaded capsule network (RICAPS) significantly improves the performance of broadcast sports video classification as deeper features are captured by the cascaded capsule network. We conduct extensive experiments on SP-2 dataset and compare the results with previously proposed methods, and the results show that RICAPS outperforms the previously proposed methods.

References

[1]
Abdullah M. Algamdi, Victor Sanchez, and Chang-Tsun Li. 2019. Learning Temporal Information from Spatial Information Using CapsNets for Human Action Recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12--17, 2019. 3867--3871.
[2]
Hakan Bilen, Basura Fernando, Efstratios Gavves, Andrea Vedaldi, and Stephen Gould. 2016. Dynamic Image Networks for Action Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 3034--3042.
[3]
João Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21--26, 2017. 4724--4733.
[4]
Aaron Chadha, Alhabib Abbas, and Yiannis Andreopoulos. 2019. Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor. IEEE Trans. Circuits Syst. Video Techn. 29, 2 (2019), 475--485.
[5]
Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Trevor Darrell. 2017. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. IEEE Trans. Pattern Anal. Mach. Intell. 39, 4 (2017), 677--691.
[6]
Kevin Duarte, Yogesh Singh Rawat, and Mubarak Shah. 2018. VideoCapsuleNet: A Simplified Network for Action Detection. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3--8 December 2018, Montréal, Canada. 7621--7630.
[7]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2018. Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 6546--6555.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 770--778.
[9]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity Mappings in Deep Residual Networks. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part IV. 630--645.
[10]
Samitha Herath, Mehrtash Tafazzoli Harandi, and Fatih Porikli. 2017. Going deeper into action recognition: A survey. Image Vision Comput. 60 (2017), 4--21.
[11]
Geoffrey E. Hinton, Alex Krizhevsky, and Sida D. Wang. 2011. Transforming Auto-Encoders. In Artificial Neural Networks and Machine Learning - ICANN 2011 - 21st International Conference on Artificial Neural Networks, Espoo, Finland, June 14--17, 2011, Proceedings, Part I. 44--51.
[12]
Geoffrey E. Hinton, Sara Sabour, and Nicholas Frosst. 2018. Matrix capsules with EM routing. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings.
[13]
Abdullah Aman Khan, Jie Shao, Waqar Ali, and Saifullah Tumrani. 2020. Content-Aware Summarization of Broadcast Sports Videos: An AudioâĂŞVisual Feature Extraction Approach. Neural Processing Letters (2020).
[14]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings.
[15]
Hildegard Kuehne, Hueihan Jhuang, Estíbaliz Garrote, Tomaso A. Poggio, and Thomas Serre. 2011. HMDB: A large video database for human motion recognition. In IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6--13, 2011. 2556--2563.
[16]
Joonseok Lee, Apostol Natsev, Walter Reade, Rahul Sukthankar, and George Toderici. 2018. The 2nd YouTube-8M Large-Scale Video Understanding Challenge. In Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8--14, 2018, Proceedings, Part IV. 193--205.
[17]
Hassan Ramchoun, Mohammed Amine Janati Idrissi, Youssef Ghanou, and Mohamed Ettaouil. 2019. Multilayer Perceptron New Method for Selecting the Architecture Based on the Choice of Different Activation Functions. IJISSS 11, 4 (2019), 21--34.
[18]
Mikel D. Rodriguez, Javed Ahmed, and Mubarak Shah. 2008. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition. In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 24--26 June 2008, Anchorage, Alaska, USA.
[19]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2015), 211--252.
[20]
Sara Sabour, Nicholas Frosst, and Geoffrey E. Hinton. 2017. Dynamic Routing Between Capsules. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA. 3856--3866.
[21]
Karen Simonyan and Andrew Zisserman. 2014. Two-Stream Convolutional Networks for Action Recognition in Videos. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 568--576.
[22]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild. Technical Report CRCV-TR-12-01. University of Central Florida.
[23]
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A. Alemi. 2017. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4--9, 2017, San Francisco, California, USA. 4278--4284.
[24]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7--12, 2015. 1--9.
[25]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27--30, 2016. 2818--2826.
[26]
Du Tran, Lubomir D. Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7--13, 2015. 4489--4497.
[27]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A Closer Look at Spatiotemporal Convolutions for Action Recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18--22, 2018. 6450--6459.
[28]
Xinshuo Weng and Kris Kitani. 2019. Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading. (2019), 269.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MMAsia '20: Proceedings of the 2nd ACM International Conference on Multimedia in Asia
March 2021
512 pages
ISBN:9781450383080
DOI:10.1145/3444685
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. capsule network
  2. sports video analysis
  3. video classification

Qualifiers

  • Research-article

Funding Sources

Conference

MMAsia '20
Sponsor:
MMAsia '20: ACM Multimedia Asia
March 7, 2021
Virtual Event, Singapore

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)2
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media