[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3301275.3302296acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
research-article

MyoSign: enabling end-to-end sign language recognition with wearables

Published: 17 March 2019 Publication History

Abstract

Automatic sign language recognition is an important milestone in facilitating the communication between the deaf community and hearing people. Existing approaches are either intrusive or susceptible to ambient environments and user diversity. Moreover, most of them perform only isolated word recognition, not sentence-level sequence translation. In this paper, we present MyoSign, a deep learning based system that enables end-to-end American Sign Language (ASL) recognition at both word and sentence levels. We leverage a lightweight wearable device which can provide inertial and electromyography signals to non-intrusively capture signs. First, we propose a multimodal Convolutional Neural Network (CNN) to abstract representations from inputs of different sensory modalities. Then, a bidirectional Long Short Term Memory (LSTM) is exploited to model temporal dependences. On the top of the networks, we employ Connectionist Temporal Classification (CTC) to get around temporal segments and achieve end-to-end continuous sign language recognition. We evaluate MyoSign on 70 commonly used ASL words and 100 ASL sentences from 15 volunteers. Our system achieves an average accuracy of 93.7% at word-level and 93.1% at sentence-level in user-independent settings. In addition, MyoSign can recognize sentences unseen in the training set with 92.4% accuracy. The encouraging results indicate that MyoSign can be a meaningful buildup in the advancement of sign language recognition.

Supplementary Material

MP4 File (p650-zhang.mp4)

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16. 265--283.
[2]
Christoph Amma, Thomas Krings, Jonas Böer, and Tanja Schultz. 2015. Advancing muscle-computer interfaces with high-density electromyography. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. ACM, 929--938.
[3]
Linda Di Geronimo, Marica Bertarini, Julia Badertscher, Maria Husmann, and Moira C Norrie. 2017. Exploiting mid-air gestures to share data among devices. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 35.
[4]
Yu Du, Yongkang Wong, Wenguang Jin, Wentao Wei, Yu Hu, Mohan Kankanhalli, and Weidong Geng. 2017. Semi-supervised learning for surface EMG-based gesture recognition. In Proceedings of the 26th International Joint Conference on Artificial Intelligence. AAAI Press, 1624--1630.
[5]
Biyi Fang, Jillian Co, and Mi Zhang. 2017. DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. ACM, 5.
[6]
Alex Graves, Santiago Fernández, Faustino Gomez, and Jürgen Schmidhuber. 2006. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd international conference on Machine learning. ACM, 369--376.
[7]
Alex Graves and Navdeep Jaitly. 2014. Towards end-to-end speech recognition with recurrent neural networks. In International Conference on Machine Learning. 1764--1772.
[8]
Alex Graves, Marcus Liwicki, Santiago Fernández, Roman Bertolami, Horst Bunke, and Jürgen Schmidhuber. 2009. A novel connectionist system for unconstrained handwriting recognition. IEEE transactions on pattern analysis and machine intelligence 31, 5 (2009), 855--868.
[9]
Hua Huang and Shan Lin. 2016. Toothbrushing monitoring using wrist watch. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 202--215.
[10]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning-Volume 37. JMLR. org, 448--456.
[11]
Frederic Kerber, Michael Puhl, and Antonio Krüger. 2017. User-independent real-time hand gesture recognition based on surface electromyography. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services. ACM, 36.
[12]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[13]
Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (2015), 108--125.
[14]
Oscar Koller, O Zargaran, Hermann Ney, and Richard Bowden. 2016. Deep sign: hybrid CNN-HMM for continuous sign language recognition. In Proceedings of the British Machine Vision Conference 2016.
[15]
Gierad Laput, Robert Xiao, and Chris Harrison. 2016. Viband: High-fidelity bio-acoustic sensing using commodity smartwatch accelerometers. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology. ACM, 321--333.
[16]
Gierad Laput, Chouchang Yang, Robert Xiao, Alanson Sample, and Chris Harrison. 2015. Em-sense: Touch recognition of uninstrumented, electrical and electromechanical objects. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology. ACM, 157--166.
[17]
Kehuang Li, Zhengyu Zhou, and Chin-Hui Lee. 2016. Sign transition modeling and a scalable solution to continuous sign language recognition for real-world applications. ACM Transactions on Accessible Computing (TACCESS) 8, 2 (2016), 7.
[18]
Yun Li, Xiang Chen, Xu Zhang, Kongqiao Wang, and Z Jane Wang. 2012. A sign-component-based framework for Chinese sign language recognition using accelerometer and sEMG data. IEEE transactions on biomedical engineering 59, 10 (2012), 2695--2704.
[19]
Jaime Lien, Nicholas Gillian, M Emre Karagozler, Patrick Amihood, Carsten Schwesig, Erik Olson, Hakim Raja, and Ivan Poupyrev. 2016. Soli: Ubiquitous gesture sensing with millimeter wave radar. ACM Transactions on Graphics (TOG) 35, 4 (2016), 142.
[20]
Jian Liu, Yingying Chen, Marco Gruteser, and Yan Wang. 2017. VibSense: Sensing Touches on Ubiquitous Surfaces through Vibration. In Sensing, Communication, and Networking (SECON), 2017 14th Annual IEEE International Conference on. IEEE, 1--9.
[21]
Yongsen Ma, Gang Zhou, Shuangquan Wang, Hongyang Zhao, and Woosub Jung. 2018. SignFi: Sign Language Recognition Using WiFi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 23.
[22]
Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. 2016. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4207--4215.
[23]
Christopher Olah. 2015. Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
[24]
Abhinav Parate, Meng-Chieh Chiu, Chaniel Chadowitz, Deepak Ganesan, and Evangelos Kalogerakis. 2014. Risq: Recognizing smoking gestures with inertial sensors on a wristband. In Proceedings of the 12th annual international conference on Mobile systems, applications, and services. ACM, 149--161.
[25]
Prajwal Paudyal, Ayan Banerjee, and Sandeep KS Gupta. 2016. Sceptre: a pervasive, non-invasive, and programmable gesture recognition technology. In Proceedings of the 21st International Conference on Intelligent User Interfaces. ACM, 282--293.
[26]
Prajwal Paudyal, Junghyo Lee, Ayan Banerjee, and Sandeep KS Gupta. 2017. Dyfav: Dynamic feature selection and voting for real-time recognition of fingerspelled alphabet using wearables. In Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 457--467.
[27]
Lionel Pigou, Aäron Van Den Oord, Sander Dieleman, Mieke Van Herreweghe, and Joni Dambre. 2018. Beyond temporal pooling: Recurrence and temporal convolutions for gesture recognition in video. International Journal of Computer Vision 126, 2-4 (2018), 430--439.
[28]
Jiacheng Shang and Jie Wu. 2017. A robust sign language recognition system with sparsely labeled instances using Wi-Fi signals. In Mobile Ad Hoc and Sensor Systems (MASS), 2017 IEEE 14th International Conference on. IEEE, 99--107.
[29]
StartASL. {n. d.}. Top 150 Videos âĂŞ Basic ASL Sign Language Words. https://www.startasl.com/basic-words-in-sign-language
[30]
Chao Sun, Tianzhu Zhang, and Changsheng Xu. 2015. Latent support vector machine modeling for sign language recognition with Kinect. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 2 (2015), 20.
[31]
American Sign Language University. {n. d.}. BASIC ASL. http://lifeprint.com/asl101/pages-layout/concepts.htm
[32]
Tran Huy Vu, Archan Misra, Quentin Roy, Kenny Choo Tsu Wei, and Youngki Lee. 2018. Smartwatch-based Early Gesture Detection 8 Trajectory Tracking for Interactive Gesture-Driven Applications. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 39.
[33]
He Wang, Ted Tsung-Te Lai, and Romit Roy Choudhury. 2015. Mole: Motion leaks through smartwatch sensors. In Proceedings of the 21st Annual International Conference on Mobile Computing and Networking. ACM, 155--166.
[34]
Sy Bor Wang, Ariadna Quattoni, Louis-Philippe Morency, David Demirdjian, and Trevor Darrell. 2006. Hidden conditional random fields for gesture recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1521--1527.
[35]
Di Wu, Lionel Pigou, Pieter-Jan Kindermans, Nam Do-Hoang Le, Ling Shao, Joni Dambre, and Jean-Marc Odobez. 2016. Deep dynamic neural networks for multimodal gesture segmentation and recognition. IEEE transactions on pattern analysis and machine intelligence 38, 8 (2016), 1583--1597.
[36]
Jian Wu, Lu Sun, and Roozbeh Jafari. 2016. A Wearable System for Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors. IEEE J. Biomedical and Health Informatics 20, 5 (2016), 1281--1290.
[37]
Lin Yang, Wei Wang, and Qian Zhang. 2016. Secret from muscle: Enabling secure pairing with electromyography. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. ACM, 28--41.
[38]
Xidong Yang, Xiang Chen, Xiang Cao, Shengjing Wei, and Xu Zhang. 2017. Chinese Sign Language Recognition Based on an Optimized Tree-Structure Framework. IEEE journal of biomedical and health informatics 21, 4 (2017), 994--1004.
[39]
Maotian Zhang, Qian Dai, Panlong Yang, Jie Xiong, Chang Tian, and Chaocan Xiang. 2018. iDial: Enabling a Virtual Dial Plate on the Hand Back for Around-Device Interaction. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 55.

Cited By

View all
  • (2024)American Sign Language Recognition and Translation Using Perception Neuron Wearable Inertial Motion Capture SystemSensors10.3390/s2402045324:2(453)Online publication date: 11-Jan-2024
  • (2024)Progression Learning Convolution Neural Model-Based Sign Language Recognition Using Wearable Glove DevicesComputation10.3390/computation1204007212:4(72)Online publication date: 3-Apr-2024
  • (2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 1-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '19: Proceedings of the 24th International Conference on Intelligent User Interfaces
March 2019
713 pages
ISBN:9781450362726
DOI:10.1145/3301275
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning network
  2. end-to-end translation
  3. sign language recognition
  4. wearable computing

Qualifiers

  • Research-article

Conference

IUI '19
Sponsor:

Acceptance Rates

IUI '19 Paper Acceptance Rate 71 of 282 submissions, 25%;
Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)65
  • Downloads (Last 6 weeks)7
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)American Sign Language Recognition and Translation Using Perception Neuron Wearable Inertial Motion Capture SystemSensors10.3390/s2402045324:2(453)Online publication date: 11-Jan-2024
  • (2024)Progression Learning Convolution Neural Model-Based Sign Language Recognition Using Wearable Glove DevicesComputation10.3390/computation1204007212:4(72)Online publication date: 3-Apr-2024
  • (2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 1-Jan-2024
  • (2024)A Sign Language Recognition Framework Based on Cross-Modal Complementary Information FusionIEEE Transactions on Multimedia10.1109/TMM.2024.337709526(8131-8144)Online publication date: 2024
  • (2024)Generalizations of Wearable Device Placements and Sentences in Sign Language Recognition With Transformer-Based ModelIEEE Transactions on Mobile Computing10.1109/TMC.2024.337347223:10(10046-10059)Online publication date: Oct-2024
  • (2024)Enhancing the Applicability of Sign Language TranslationIEEE Transactions on Mobile Computing10.1109/TMC.2024.335011123:9(8634-8648)Online publication date: Sep-2024
  • (2024)HDTSLR: A Framework Based on Hierarchical Dynamic Positional Encoding for Sign Language RecognitionIEEE Transactions on Mobile Computing10.1109/TMC.2023.3310712(1-13)Online publication date: 2024
  • (2024)LD-Recognition: Classroom Action Recognition Based on Passive RFIDIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.323442311:1(1182-1191)Online publication date: Feb-2024
  • (2024)Ultra Write: A Lightweight Continuous Gesture Input System with Ultrasonic Signals on COTS Devices2024 IEEE International Conference on Pervasive Computing and Communications (PerCom)10.1109/PerCom59722.2024.10494485(174-183)Online publication date: 11-Mar-2024
  • (2024)ASLRing: American Sign Language Recognition with Meta-Learning on Wearables2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI)10.1109/IoTDI61053.2024.00022(203-214)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media