More Web Proxy on the site http://driver.im/

research-article

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

Authors:

HyeonJung Park,

JeongGil KoAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 5, Issue 2

Article No.: 77, Pages 1 - 30

https://doi.org/10.1145/3463498

Published: 24 June 2021 Publication History

Abstract

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.

References

[1]

2020. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss

[2]

2020. Harness AI at the Edge with the Jetson TX2. https://developer.nvidia.com/embedded/jetson-tx2-developer-kit

[3]

2020. Jetson Nano. https://developer.nvidia.com/embedded/jetson-nano

[4]

J. Ahn, H. Nguyen Loc, R. Krishna Balan, Y. Lee, and J. Ko. 2018. Finding Small-Bowel Lesions: Challenges in Endoscopy-Image-Based Learning Systems. Computer 51, 5 (2018), 68--76.

[5]

Saleh Aly and Walaa Aly. 2020. DeepArSLR: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access 8 (2020), 83199--83212.

[6]

Walaa Aly, Saleh Aly, and Sultan Almotairi. 2019. User-independent American sign language alphabet recognition based on depth image and PCANet features. IEEE Access 7 (2019), 123138--123150.

[7]

Ursula Bellugi and Susan Fischer. 1972. A comparison of sign language and spoken language. Cognition 1, 2 (1972), 173 - 200. https://doi.org/10.1016/0010-0277(72)90018-2

[8]

Danielle Bragg, Oscar Koller, Mary Bellard, Larwan Berke, Patrick Boudreault, Annelies Braffort, Naomi Caselli, Matt Huenerfauth, Hernisa Kacorri, Tessa Verhoef, et al. 2019. Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. 16--31.

Digital Library

[9]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[10]

Xiujuan Chai, Guang Li, Yushun Lin, Zhihao Xu, Yili Tang, Xilin Chen, and Ming Zhou. 2013. Sign language recognition and translation with kinect. In IEEE Conf. on AFGR, Vol. 655. 4.

[11]

Cao Dong, Ming C Leu, and Zhaozheng Yin. 2015. American sign language alphabet recognition using microsoft kinect. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 44--52.

[12]

Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, et al. 2020. DepthLab: Real-Time 3D Interaction With Depth Maps for Mobile Augmented Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 829--843.

Digital Library

[13]

Deniz Ekiz, Gamze Ege Kaya, Serkan Buğur, Sila Güler, Buse Buz, Bilgin Kosucu, and Bert Arnrich. 2017. Sign sentence recognition with smart watches. In 2017 25th Signal Processing and Communications Applications Conference (SIU). IEEE, 1--4.

[14]

Biyi Fang, Jillian Co, and Mi Zhang. 2017. DeepASL: Enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1--13.

Digital Library

[15]

Susan Goldin-Meadow and Diane Brentari. 2017. Gesture, sign, and language: The coming of age of sign language and gesture studies. Behavioral and Brain Sciences 40 (2017), e46. https://doi.org/10.1017/S0140525X15001247

[16]

Dan Guo, Wengang Zhou, Houqiang Li, and Meng Wang. 2017. Online early-late fusion based on adaptive HMM for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1 (2017), 1--18.

Digital Library

[17]

Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[19]

Jiahui Hou, Xiang-Yang Li, Peide Zhu, Zefan Wang, Yu Wang, Jianwei Qian, and Panlong Yang. 2019. SignSpeaker: A Real-Time, High-Precision SmartWatch-Based Sign Language Translator. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom '19). Association for Computing Machinery, New York, NY, USA, Article 24, 15 pages. https://doi.org/10.1145/3300061.3300117

Digital Library

[20]

Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3d convolutional neural networks. In 2015 IEEE international conference on multimedia and expo (ICME). IEEE, 1--6.

[21]

Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based sign language recognition without temporal segmentation. In Thirty-Second AAAI Conference on Artificial Intelligence.

[22]

Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82--95.

Digital Library

[23]

Hamid Reza Vaezi Joze and Oscar Koller. 2018. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053 (2018).

[24]

Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR abs/1705.06950 (2017). arXiv:1705.06950 http://arxiv.org/abs/1705.06950

[25]

Wajahat Kazmi, Sergi Foix, Guillem Alenyà, and Hans Jørgen Andersen. 2014. Indoor and outdoor depth imaging of leaves with time-of-flight and stereo vision sensors: Analysis and comparison. ISPRS journal of photogrammetry and remote sensing 88 (2014), 128--146.

[26]

Hyun Myung Kim, Min Seok Kim, Gil Ju Lee, Hyuk Jae Jang, and Young Min Song. 2020. Miniaturized 3D depth sensing-based smartphone light field camera. Sensors 20, 7 (2020), 2129.

[27]

Oscar Koller, Cihan Camgoz, Hermann Ney, and Richard Bowden. 2019. Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE transactions on pattern analysis and machine intelligence (2019).

Digital Library

[28]

Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (Dec. 2015), 108--125.

[29]

Oscar Koller, Sepehr Zargaran, Hermann Ney, and Richard Bowden. 2018. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. International Journal of Computer Vision 126, 12 (2018), 1311--1325.

Digital Library

[30]

Okan Köpüklü, Neslihan Kose, Ahmet Gunduz, and Gerhard Rigoll. 2019. Resource efficient 3d convolutional neural networks. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 1910--1919.

[31]

Dongxu Li, Cristian Rodriguez, Xin Yu, and Hongdong Li. 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1459--1469.

[32]

Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).

[33]

Yanqiu Liao, Pengwen Xiong, Weidong Min, Weiqiong Min, and Jiahao Lu. 2019. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7 (2019), 38044--38054.

[34]

Yongsen Ma, Gang Zhou, Shuangquan Wang, Hongyang Zhao, and Woosub Jung. 2018. Signfi: Sign language recognition using wifi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--21.

Digital Library

[35]

Ilya Makarov, Nikolay Veldyaykin, Maxim Chertkov, and Aleksei Pokoev. 2019. American and russian sign language dactyl recognition. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 204--210.

Digital Library

[36]

Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM computer communication review 38, 2 (2008), 69--74.

Digital Library

[37]

Mohamed A Mohandes. 2013. Recognition of two-handed Arabic signs using the CyberGlove. Arabian Journal for Science and Engineering 38, 3 (2013), 669--677.

[38]

Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. 2016. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4207--4215.

[39]

Chaithanya Kumar Mummadi, Frederic Philips Peter Leo, Keshav Deep Verma, Shivaji Kasireddy, Philipp Marcel Scholl, and Kristof Van Laerhoven. 2017. Real-time embedded recognition of sign language alphabet fingerspelling in an imu-based glove. In Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.

Digital Library

[40]

myoarmband [n. d.]. https://support.getmyo.com/hc/en-us

[41]

Deepali Naglot and Milind Kulkarni. 2016. Real time sign language recognition using the leap motion controller. In 2016 International Conference on Inventive Computation Technologies (ICICT), Vol. 3. IEEE, 1--5.

[42]

JaeYeon Park, Hyeon Cho, Rajesh Balan, and JeongGil Ko. 2020. HeartQuake: Accurate Low-Cost Non-Invasive ECG Monitoring Using Bed-Mounted Geophones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020).

Digital Library

[43]

Sunitha Ravi, Maloji Suman, PVV Kishore, Kiran Kumar, Anil Kumar, et al. 2019. Multi modal spatio temporal co-trained CNNs with single modal testing on RGB-D based sign language gesture recognition. Journal of Computer Languages 52 (2019), 88--102.

[44]

Panneer Selvam Santhalingam, Al Amin Hosain, Ding Zhang, Parth Pathak, Huzefa Rangwala, and Raja Kushalnagar. 2020. mmASL: Environment-Independent ASL Gesture Recognition Using 60 GHz Millimeter-wave Signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.

Digital Library

[45]

Jiacheng Shang and Jie Wu. 2017. A robust sign language recognition system with sparsely labeled instances using Wi-Fi signals. In 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). IEEE, 99--107.

[46]

Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, and Karen Livescu. 2018. American sign language fingerspelling recognition in the wild. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 145--152.

[47]

Harold P. Stern, Samy A. Mahmoud, and Kin-Kwok Wong. 1996. A comprehensive model for voice activity in conversational speechdevelopment and application to performance analysis of new-generation wireless communication systems. Wireless Networks 2 (1996), 359--367. Issue 4.

Digital Library

[48]

Chao Sun, Tianzhu Zhang, and Changsheng Xu. 2015. Latent support vector machine modeling for sign language recognition with Kinect. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 2 (2015), 1--20.

Digital Library

[49]

Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A survey on deep transfer learning. In International conference on artificial neural networks. Springer, 270--279.

[50]

Ultraleap. [n. d.]. Leap Motion Contorller. Available at https://www.ultraleap.com/product/leap-motion-controller/.

[51]

Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated sign language recognition with grassmann covariance matrices. ACM Transactions on Accessible Computing (TACCESS) 8, 4 (2016), 1--21.

Digital Library

[52]

Seongok Won. 2019. A Study on the Korean Sign Language(KSL) Grammer.

[53]

Jian Wu, Lu Sun, and Roozbeh Jafari. 2016. A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE journal of biomedical and health informatics 20, 5 (2016), 1281--1290.

[54]

Quan Yang. 2010. Chinese sign language recognition based on video sequence appearance modeling. In 2010 5th IEEE Conference on Industrial Electronics and Applications. IEEE, 1537--1542.

[55]

Pujianto Yugopuspito, I Made Murwantara, and Jessica Sean. 2018. Mobile sign language recognition for bahasa indonesia using convolutional neural network. In Proceedings of the 16th International Conference on Advances in Mobile Computing and Multimedia. 84--91.

Digital Library

[56]

Lei Zhang, Yixiang Zhang, and Xiaolong Zheng. 2020. WiSign: Ubiquitous American Sign Language Recognition Using Commercial Wi-Fi Devices. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 3 (2020), 1--24.

Digital Library

[57]

Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2019. MyoSign: enabling end-to-end sign language recognition with wearables. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 650--660.

Digital Library

[58]

Zhengyou Zhang. 2012. Microsoft Kinect Sensor and Its Effect. IEEE MultiMedia 19 (April 2012), 4--12. https://www.microsoft.com/en-us/research/publication/microsoft-kinect-sensor-and-its-effect/

Digital Library

[59]

Henry Zhong, Salil S Kanhere, and Chun Tung Chou. 2017. VeinDeep: smartphone unlock using vein patterns. In 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 2--10.

[60]

Zijie Zhu, Xuewei Wang, Aakaash Kapoor, Zhichao Zhang, Tingrui Pan, and Zhou Yu. 2018. EIS: A Wearable Device for Epidermal American Sign Language Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--22.

Digital Library

Cited By

Lee KYun JJin JHan JKo J(2025)Mind your indices! Index hijacking attacks on collaborative unpooling autoencoder systemsInternet of Things10.1016/j.iot.2024.10146229(101462)Online publication date: Jan-2025
https://doi.org/10.1016/j.iot.2024.101462
Alam IHameed AZiar R(2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1155/2024/1487500
Jang JLee HKim HDev KYoo JMeinerzhagen P(2024)EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM SystemsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670801(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670801
Show More Cited By

Index Terms

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

Deep Learning Methods for Sign Language Translation
Many sign languages are bona fide natural languages with grammatical rules and lexicons hence can benefit from machine translation methods. Similarly, since sign language is a visual-spatial language, it can also benefit from computer vision methods for ...
A machine translation system from Arabic sign language to Arabic
Abstract
Arabic sign language (ArSL) is one of the sign languages that is used in Arab countries. This language has structure and grammar that differ from spoken Arabic. Available ArSL recognition systems perform direct mapping between the recognized sign ...
American and Russian Sign Language Dactyl Recognition and Text2Sign Translation
Analysis of Images, Social Networks and Texts
Abstract
Sign language is the main way to communicate for people from deaf community. However, common people mostly do not know sign language. In this paper, we overview several real-time sign language dactyl recognition systems using deep convolutional ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 5, Issue 2

June 2021

932 pages

EISSN:2474-9567

DOI:10.1145/3472726

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 June 2021

Published in IMWUT Volume 5, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
754
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)23

Reflects downloads up to 31 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee KYun JJin JHan JKo J(2025)Mind your indices! Index hijacking attacks on collaborative unpooling autoencoder systemsInternet of Things10.1016/j.iot.2024.10146229(101462)Online publication date: Jan-2025
https://doi.org/10.1016/j.iot.2024.101462
Alam IHameed AZiar R(2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 11-Mar-2024
https://dl.acm.org/doi/10.1155/2024/1487500
Jang JLee HKim HDev KYoo JMeinerzhagen P(2024)EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM SystemsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670801(1-6)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3665314.3670801
Li JXu JLiu YXu WLi Z(2024)Enhancing the Applicability of Sign Language TranslationIEEE Transactions on Mobile Computing10.1109/TMC.2024.335011123:9(8634-8648)Online publication date: Sep-2024
https://doi.org/10.1109/TMC.2024.3350111
Ellappan VSathiskumar RSaikrishna PSivakumar CSelvam R(2024)Multimodal Deep Neural Networks for Robust Sign Language Translation in Real-World Environments2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)10.1109/ICDCECE60827.2024.10548470(1-6)Online publication date: 26-Apr-2024
https://doi.org/10.1109/ICDCECE60827.2024.10548470
Yin YXie LJiang ZXiao FCao JLu S(2024)A Systematic Review of Human Activity Recognition Based on Mobile Devices: Overview, Progress and TrendsIEEE Communications Surveys & Tutorials10.1109/COMST.2024.335759126:2(890-929)Online publication date: 23-Jan-2024
https://dl.acm.org/doi/10.1109/COMST.2024.3357591
Park JKo J(2024)FedHM: Practical federated learning for heterogeneous model deploymentsICT Express10.1016/j.icte.2023.07.01310:2(387-392)Online publication date: Apr-2024
https://doi.org/10.1016/j.icte.2023.07.013
Park JLee KLee SZhang MKo J(2023)AttFLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109177:3(1-31)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3610917
Xie ZOuyang XPan LLu WXing GLiu XHui PAmiri Sani ANurmi PLiu Y(2023)Mozart: A Mobile ToF System for Sensing in the Dark through Phase ManipulationProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596840(163-176)Online publication date: 18-Jun-2023
https://doi.org/10.1145/3581791.3596840
Salem OMehaoua ABoutaba R(2023)The Sight for Hearing: An IoT-Based System to Assist Drivers with Hearing Disability2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218250(1305-1310)Online publication date: 9-Jul-2023
https://doi.org/10.1109/ISCC58397.2023.10218250
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents