[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-031-37660-3_25guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Egocentric Hand Gesture Recognition on Untrimmed Videos Using State Activation Gate LSTMs

Published: 30 July 2023 Publication History

Abstract

Deep Neural Networks have been used for recognising ego-hand gestures in trimmed videos extensively. However, recognising ego-hand gestures from untrimmed videos has been largely unexplored. In this work, we propose the concept of State Activation Gate (StAG) to extend the current LSTM framework and successfully apply it to recognising ego-hand gestures from untrimmed videos. We explore the usage of StAG LSTM combined with 3D convolutional neural networks to compare their performance to the state-of-the-art for two publicly available datasets. In addition, we present an intra-gesture (IG) loss function and a metric that favours continuity of gesture labels called Continuity Favouring Jaccard Index (CFJI). StAG LSTM reduces the need to use heuristics currently employed in ego-hand gesture recognition on untrimmed videos. Using the proposed IG loss function for training, achieves better performance on metrics like Jaccard Index (JI) and AUC scores compared to the state of the art.

References

[1]
Abavisani, M., Joze, H.R.V., Patel, V.M.: Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019)
[2]
Benitez-Garcia, G., Haris, M., Tsuda, Y., Ukita, N.: Continuous finger gesture spotting and recognition based on similarities between start and end frames. IEEE Trans. Intell. Transp. Syst. (2020)
[3]
Buch, S., Escorcia, V., Ghanem, B., Fei-Fei, L., Niebles, J.C.: End-to-end, single-stream temporal action detection in untrimmed videos. In: British Machine Vision Conference (2017)
[4]
Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals shyamal. In: Computer Vision and Pattern Recognition (2017)
[5]
Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: International Conference on Computer Vision (2017)
[6]
Chalasani, T., Ondrej, J., Smolic, A.: Egocentric gesture recognition for head-mounted AR devices. In: Adjunct Proceedings - 2018 IEEE International Symposium on Mixed and Augmented Reality, ISMAR-Adjunct 2018 (2018)
[7]
Chalasani, T., Smolic, A.: Simultaneous segmentation and recognition: towards more accurate ego gesture recognition. In: International Conference on Computer Vision Workshop (2019)
[8]
Donahue, J., et al.: Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. Conference on Computer Vision and Pattern Recognition (2015)
[9]
Fan, D., Lu, H., Xu, S., Cao, S.: Multi-task and multi-modal learning for rgb dynamic gesture recognition. IEEE Sens. J. (2021)
[10]
Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation (1997)
[11]
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: International Conference on Automatic Face and Gesture Recognition (2019)
[12]
Lee, D., Yoon, H., Kim, J.: Continuous gesture recognition by using gesture spotting. In: International Conference on Control, Automation and Systems (2016)
[13]
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. European conference on computer vision (2016)
[14]
Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019)
[15]
Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
[16]
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision (2017)
[17]
Sahu, A., Chowdary, A.S.: Together recongnizing, localizing and summarizing actions in egocentric videos. IEEE Trans. Image Process. (2021)
[18]
Shi, B., Dai, Q., Mu, Y., Wang, J.: Weakly-supervised action localization by generative attention modeling. In: Computer Vision and Pattern Recognition (2020)
[19]
Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)
[20]
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)
[21]
Tran, D., Wang, H., Torresani, L., Ray, J., Lecun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2018)
[22]
Wan, J., Li, S.Z., Zhao, Y., Zhou, S., Guyon, I., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2016)
[23]
Wu, S., Zhong, S., Liu, Y.: Deep residual learning for image recognition. CVPR (2016)
[24]
Xu, M., Gao, M., Chen, Y.T., Davis, L., Crandall, D.: Temporal recurrent networks for online action detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)
[25]
Zhang, X.Y., Shi, H., Li, C., Li, P.: Multi-instance multi-label action recognition and localization based on spatio-temporal pre-trimming for untrimmed videos. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)
[26]
Zhang, X.Y., Shi, H., Li, C., Zheng, K., Zhu, X., Duan, L.: Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
[27]
Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. (2018)
[28]
Zhu, G., Zhang, L., Shen, P., Song, J., Shah, S.A.A., Bennamoun, M.: Continuous gesture segmentation and recognition using 3dcnn and convolutional lstm. IEEE Trans. Multimed. (2019)
[29]
Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M.J., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges: Montreal, QC, Canada, August 21–25, 2022, Proceedings, Part I
Aug 2022
722 pages
ISBN:978-3-031-37659-7
DOI:10.1007/978-3-031-37660-3

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 July 2023

Author Tags

  1. Egocentric Vision
  2. HCI
  3. Hand Gesture Recognition

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media