More Web Proxy on the site http://driver.im/

Article

Egocentric Hand Gesture Recognition on Untrimmed Videos Using State Activation Gate LSTMs

Authors:

Tejo Chalasani,

Aljosa SmolicAuthors Info & Claims

Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges: Montreal, QC, Canada, August 21–25, 2022, Proceedings, Part I

Pages 359 - 372

https://doi.org/10.1007/978-3-031-37660-3_25

Published: 30 July 2023 Publication History

Abstract

Deep Neural Networks have been used for recognising ego-hand gestures in trimmed videos extensively. However, recognising ego-hand gestures from untrimmed videos has been largely unexplored. In this work, we propose the concept of State Activation Gate (StAG) to extend the current LSTM framework and successfully apply it to recognising ego-hand gestures from untrimmed videos. We explore the usage of StAG LSTM combined with 3D convolutional neural networks to compare their performance to the state-of-the-art for two publicly available datasets. In addition, we present an intra-gesture (IG) loss function and a metric that favours continuity of gesture labels called Continuity Favouring Jaccard Index (CFJI). StAG LSTM reduces the need to use heuristics currently employed in ego-hand gesture recognition on untrimmed videos. Using the proposed IG loss function for training, achieves better performance on metrics like Jaccard Index (JI) and AUC scores compared to the state of the art.

References

[1]

Abavisani, M., Joze, H.R.V., Patel, V.M.: Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019)

[2]

Benitez-Garcia, G., Haris, M., Tsuda, Y., Ukita, N.: Continuous finger gesture spotting and recognition based on similarities between start and end frames. IEEE Trans. Intell. Transp. Syst. (2020)

[3]

Buch, S., Escorcia, V., Ghanem, B., Fei-Fei, L., Niebles, J.C.: End-to-end, single-stream temporal action detection in untrimmed videos. In: British Machine Vision Conference (2017)

[4]

Buch, S., Escorcia, V., Shen, C., Ghanem, B., Niebles, J.C.: SST: single-stream temporal action proposals shyamal. In: Computer Vision and Pattern Recognition (2017)

[5]

Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: International Conference on Computer Vision (2017)

[6]

Chalasani, T., Ondrej, J., Smolic, A.: Egocentric gesture recognition for head-mounted AR devices. In: Adjunct Proceedings - 2018 IEEE International Symposium on Mixed and Augmented Reality, ISMAR-Adjunct 2018 (2018)

[7]

Chalasani, T., Smolic, A.: Simultaneous segmentation and recognition: towards more accurate ego gesture recognition. In: International Conference on Computer Vision Workshop (2019)

[8]

Donahue, J., et al.: Long-Term Recurrent Convolutional Networks for Visual Recognition and Description. Conference on Computer Vision and Pattern Recognition (2015)

[9]

Fan, D., Lu, H., Xu, S., Cao, S.: Multi-task and multi-modal learning for rgb dynamic gesture recognition. IEEE Sens. J. (2021)

[10]

Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation (1997)

[11]

Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. In: International Conference on Automatic Face and Gesture Recognition (2019)

[12]

Lee, D., Yoon, H., Kim, J.: Continuous gesture recognition by using gesture spotting. In: International Conference on Control, Automation and Systems (2016)

[13]

Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal lstm with trust gates for 3d human action recognition. European conference on computer vision (2016)

[14]

Long, F., Yao, T., Qiu, Z., Tian, X., Luo, J., Mei, T.: Gaussian temporal awareness networks for action localization. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2019)

[15]

Molchanov, P., Yang, X., Gupta, S., Kim, K., Tyree, S., Kautz, J.: Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

[16]

Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision (2017)

[17]

Sahu, A., Chowdary, A.S.: Together recongnizing, localizing and summarizing actions in egocentric videos. IEEE Trans. Image Process. (2021)

[18]

Shi, B., Dai, Q., Mu, Y., Wang, J.: Weakly-supervised action localization by generative attention modeling. In: Computer Vision and Pattern Recognition (2020)

[19]

Shou, Z., Wang, D., Chang, S.F.: Temporal action localization in untrimmed videos via multi-stage CNNs. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016)

[20]

Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision (2015)

[21]

Tran, D., Wang, H., Torresani, L., Ray, J., Lecun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2018)

[22]

Wan, J., Li, S.Z., Zhao, Y., Zhou, S., Guyon, I., Escalera, S.: ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (2016)

[23]

Wu, S., Zhong, S., Liu, Y.: Deep residual learning for image recognition. CVPR (2016)

[24]

Xu, M., Gao, M., Chen, Y.T., Davis, L., Crandall, D.: Temporal recurrent networks for online action detection. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

[25]

Zhang, X.Y., Shi, H., Li, C., Li, P.: Multi-instance multi-label action recognition and localization based on spatio-temporal pre-trimming for untrimmed videos. In: Proceedings of the AAAI Conference on Artificial Intelligence (2020)

[26]

Zhang, X.Y., Shi, H., Li, C., Zheng, K., Zhu, X., Duan, L.: Learning transferable self-attentive representations for action recognition in untrimmed videos with weak supervision. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

[27]

Zhang, Y., Cao, C., Cheng, J., Lu, H.: EgoGesture: a new dataset and benchmark for egocentric hand gesture recognition. IEEE Trans. Multimed. (2018)

[28]

Zhu, G., Zhang, L., Shen, P., Song, J., Shah, S.A.A., Bennamoun, M.: Continuous gesture segmentation and recognition using 3dcnn and convolutional lstm. IEEE Trans. Multimed. (2019)

[29]

Zimmermann, C., Ceylan, D., Yang, J., Russell, B., Argus, M.J., Brox, T.: FreiHAND: a dataset for markerless capture of hand pose and shape from single rgb images. In: Proceedings of the IEEE International Conference on Computer Vision (2019)

Recommendations

Finger identification and hand gesture recognition techniques for natural user interface
APCHI '13: Proceedings of the 11th Asia Pacific Conference on Computer Human Interaction

The natural user interface using hand gesture have been popular field in Human-Computer-Interaction(HCI). Many research papers have been proposed in this field. They proposed vision-based, glove-based and depth-based approach for hand gesture ...
Multi-scenario gesture recognition using Kinect
CGAMES '12: Proceedings of the 2012 17th International Conference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educational & Serious Games (CGAMES)

Hand gesture recognition (HGR) is an important research topic because some situations require silent communication with sign languages. Computational HGR systems assist silent communication, and help people learn a sign language. In this article, a ...
Real-time hand gesture recognition using complex-valued neural network (CVNN)
ICONIP'11: Proceedings of the 18th international conference on Neural Information Processing - Volume Part I

Computer vision system is one of the newest approaches for human computer interaction. Recently, the direct use of our hands as natural input devices has shown promising progress. Toward this progress, we introduce a hand gesture recognition system in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Pattern Recognition, Computer Vision, and Image Processing. ICPR 2022 International Workshops and Challenges: Montreal, QC, Canada, August 21–25, 2022, Proceedings, Part I

Aug 2022

722 pages

ISBN:978-3-031-37659-7

DOI:10.1007/978-3-031-37660-3

Editors:
Jean-Jacques Rousseau
York University, Toronto, ON, Canada
,
Bill Kapralos
Ontario Tech University, Oshawa, ON, Canada

© Springer Nature Switzerland AG 2023.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 July 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Table of Contents