[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition

  • Conference paper
  • First Online:
Recent Challenges in Intelligent Information and Database Systems (ACIIDS 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1863))

Included in the following conference series:

  • 466 Accesses

Abstract

Action Recognition has been studied for many years. In recent years, there are some methods using 3D-CNN (C3D, I3D, R2 + 1D), which have high accuracy, but it is hard to train and quite time-consuming due to the network architecture of extracting spatial–temporal features and the huge action dataset. Since 2D-CNN has a pre-trained model with high accuracy and speed in object recognition, there is also a method of fine-tune it on Recurrent neural network (RNN), Long Short-Term Memory (LSTM) network and other network that can extract temporal features, but due to the poor performance of fine-tune, although the speed is increased, the accuracy has dropped significantly. Therefore, this research wants to use the high accuracy of 3D-CNN to distill 2D-CNN produce a great pre-trained model for action recognition and combine it with Attention Mechanism LSTM to make model on fine-tune on other action dataset can accelerate and achieve the accuracy of approximating 3D-CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 71.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2017)

    Google Scholar 

  2. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)

    Google Scholar 

  3. Qiu, Z., Yao, T., Mei, T.: Learning spatio temporal representation with pseudo3d residual networks. In: ICCV, pp. 5534–5542 (2017)

    Google Scholar 

  4. Thung, G., Jiang, H.: A torch library for action recognition and detection using CNNs and LSTMs (2016)

    Google Scholar 

  5. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  6. Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR, pp. 4724–4733 (2017)

    Google Scholar 

  7. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: CVPR, pp. 4489–4497 (2015)

    Google Scholar 

  8. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2012)

    Article  Google Scholar 

  9. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR, pp. 6450–6459 (2018)

    Google Scholar 

  10. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

    Google Scholar 

  11. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR, pp. 1933–1941 (2016)

    Google Scholar 

  12. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: CVPR, pp. 20–36 (2016)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  14. Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild.arXiv: 1212.0402 (2012)

    Google Scholar 

  15. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: a large video database for human motion recognition. In: ICCV (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ju-Chin Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhu, SJ., Lin, CR., Lin, WT., Chen, JC. (2023). A Distilled 2D CNN-LSTM Framework with Temporal Attention Mechanism for Action Recognition. In: Nguyen, N.T., et al. Recent Challenges in Intelligent Information and Database Systems. ACIIDS 2023. Communications in Computer and Information Science, vol 1863. Springer, Cham. https://doi.org/10.1007/978-3-031-42430-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42430-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42429-8

  • Online ISBN: 978-3-031-42430-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics