A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition

Shi-Jie Zhu¹²,
Cheng-Rong Lin¹²,
Wei-Ting Lin¹² &
…
Ju-Chin Chen¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1863))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

466 Accesses

Abstract

Action Recognition has been studied for many years. In recent years, there are some methods using 3D-CNN (C3D, I3D, R2 + 1D), which have high accuracy, but it is hard to train and quite time-consuming due to the network architecture of extracting spatial–temporal features and the huge action dataset. Since 2D-CNN has a pre-trained model with high accuracy and speed in object recognition, there is also a method of fine-tune it on Recurrent neural network (RNN), Long Short-Term Memory (LSTM) network and other network that can extract temporal features, but due to the poor performance of fine-tune, although the speed is increased, the accuracy has dropped significantly. Therefore, this research wants to use the high accuracy of 3D-CNN to distill 2D-CNN produce a great pre-trained model for action recognition and combine it with Attention Mechanism LSTM to make model on fine-tune on other action dataset can accelerate and achieve the accuracy of approximating 3D-CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 71.50; Price includes VAT (United Kingdom)

Softcover Book: GBP 89.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Select and Focus: Action Recognition with Spatial-Temporal Attention

Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Article 14 January 2020

Deep learning network model based on fusion of spatiotemporal features for action recognition

Article 14 February 2022

References

Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2017)
Google Scholar
Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio temporal representation with pseudo3d residual networks. In: ICCV, pp. 5534–5542 (2017)
Google Scholar
Thung, G., Jiang, H.: A torch library for action recognition and detection using CNNs and LSTMs (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 4724–4733 (2017)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR, pp. 4489–4497 (2015)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2012)
Article Google Scholar
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR, pp. 6450–6459 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)
Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR, pp. 1933–1941 (2016)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: CVPR, pp. 20–36 (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild.arXiv: 1212.0402 (2012)
Google Scholar
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, National Kaohsiung University of Science and Technology, Kaohsiung, Republic of China
Shi-Jie Zhu, Cheng-Rong Lin, Wei-Ting Lin & Ju-Chin Chen

Authors

Shi-Jie Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Rong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ting Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ju-Chin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ju-Chin Chen .

Editor information

Editors and Affiliations

Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Siridech Boonsang
Iwate Prefectural University, Iwate, Japan
Hamido Fujita
Wrocław University of Science and Technology, Wrocław, Poland
Bogumiła Hnatkowska
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
King Mongkut's Institute of Technology, Ladkrabang, Thailand
Kitsuchart Pasupa
Malaysia Japan International Institute of Technology, Kuala Lumpur, Malaysia
Ali Selamat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, SJ., Lin, CR., Lin, WT., Chen, JC. (2023). A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-42430-4_26
Published: 29 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42429-8
Online ISBN: 978-3-031-42430-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Select and Focus: Action Recognition with Spatial-Temporal Attention

Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Deep learning network model based on fusion of spatiotemporal features for action recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Select and Focus: Action Recognition with Spatial-Temporal Attention

Spatio-temporal SRU with global context-aware attention for 3D human action recognition

Deep learning network model based on fusion of spatiotemporal features for action recognition

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation