[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

Published: 24 June 2021 Publication History

Abstract

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.

References

[1]
2020. Deafness and hearing loss. https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
[2]
2020. Harness AI at the Edge with the Jetson TX2. https://developer.nvidia.com/embedded/jetson-tx2-developer-kit
[3]
2020. Jetson Nano. https://developer.nvidia.com/embedded/jetson-nano
[4]
J. Ahn, H. Nguyen Loc, R. Krishna Balan, Y. Lee, and J. Ko. 2018. Finding Small-Bowel Lesions: Challenges in Endoscopy-Image-Based Learning Systems. Computer 51, 5 (2018), 68--76.
[5]
Saleh Aly and Walaa Aly. 2020. DeepArSLR: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition. IEEE Access 8 (2020), 83199--83212.
[6]
Walaa Aly, Saleh Aly, and Sultan Almotairi. 2019. User-independent American sign language alphabet recognition based on depth image and PCANet features. IEEE Access 7 (2019), 123138--123150.
[7]
Ursula Bellugi and Susan Fischer. 1972. A comparison of sign language and spoken language. Cognition 1, 2 (1972), 173 - 200. https://doi.org/10.1016/0010-0277(72)90018-2
[8]
Danielle Bragg, Oscar Koller, Mary Bellard, Larwan Berke, Patrick Boudreault, Annelies Braffort, Naomi Caselli, Matt Huenerfauth, Hernisa Kacorri, Tessa Verhoef, et al. 2019. Sign Language Recognition, Generation, and Translation: An Interdisciplinary Perspective. In The 21st International ACM SIGACCESS Conference on Computers and Accessibility. 16--31.
[9]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[10]
Xiujuan Chai, Guang Li, Yushun Lin, Zhihao Xu, Yili Tang, Xilin Chen, and Ming Zhou. 2013. Sign language recognition and translation with kinect. In IEEE Conf. on AFGR, Vol. 655. 4.
[11]
Cao Dong, Ming C Leu, and Zhaozheng Yin. 2015. American sign language alphabet recognition using microsoft kinect. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 44--52.
[12]
Ruofei Du, Eric Turner, Maksym Dzitsiuk, Luca Prasso, Ivo Duarte, Jason Dourgarian, Joao Afonso, Jose Pascoal, Josh Gladstone, Nuno Cruces, et al. 2020. DepthLab: Real-Time 3D Interaction With Depth Maps for Mobile Augmented Reality. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology. 829--843.
[13]
Deniz Ekiz, Gamze Ege Kaya, Serkan Buğur, Sila Güler, Buse Buz, Bilgin Kosucu, and Bert Arnrich. 2017. Sign sentence recognition with smart watches. In 2017 25th Signal Processing and Communications Applications Conference (SIU). IEEE, 1--4.
[14]
Biyi Fang, Jillian Co, and Mi Zhang. 2017. DeepASL: Enabling ubiquitous and non-intrusive word and sentence-level sign language translation. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1--13.
[15]
Susan Goldin-Meadow and Diane Brentari. 2017. Gesture, sign, and language: The coming of age of sign language and gesture studies. Behavioral and Brain Sciences 40 (2017), e46. https://doi.org/10.1017/S0140525X15001247
[16]
Dan Guo, Wengang Zhou, Houqiang Li, and Meng Wang. 2017. Online early-late fusion based on adaptive HMM for sign language recognition. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14, 1 (2017), 1--18.
[17]
Kensho Hara, Hirokatsu Kataoka, and Yutaka Satoh. 2017. Learning spatio-temporal features with 3D residual networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 3154--3160.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[19]
Jiahui Hou, Xiang-Yang Li, Peide Zhu, Zefan Wang, Yu Wang, Jianwei Qian, and Panlong Yang. 2019. SignSpeaker: A Real-Time, High-Precision SmartWatch-Based Sign Language Translator. In The 25th Annual International Conference on Mobile Computing and Networking (MobiCom '19). Association for Computing Machinery, New York, NY, USA, Article 24, 15 pages. https://doi.org/10.1145/3300061.3300117
[20]
Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3d convolutional neural networks. In 2015 IEEE international conference on multimedia and expo (ICME). IEEE, 1--6.
[21]
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based sign language recognition without temporal segmentation. In Thirty-Second AAAI Conference on Artificial Intelligence.
[22]
Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. 82--95.
[23]
Hamid Reza Vaezi Joze and Oscar Koller. 2018. Ms-asl: A large-scale data set and benchmark for understanding american sign language. arXiv preprint arXiv:1812.01053 (2018).
[24]
Will Kay, João Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. CoRR abs/1705.06950 (2017). arXiv:1705.06950 http://arxiv.org/abs/1705.06950
[25]
Wajahat Kazmi, Sergi Foix, Guillem Alenyà, and Hans Jørgen Andersen. 2014. Indoor and outdoor depth imaging of leaves with time-of-flight and stereo vision sensors: Analysis and comparison. ISPRS journal of photogrammetry and remote sensing 88 (2014), 128--146.
[26]
Hyun Myung Kim, Min Seok Kim, Gil Ju Lee, Hyuk Jae Jang, and Young Min Song. 2020. Miniaturized 3D depth sensing-based smartphone light field camera. Sensors 20, 7 (2020), 2129.
[27]
Oscar Koller, Cihan Camgoz, Hermann Ney, and Richard Bowden. 2019. Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos. IEEE transactions on pattern analysis and machine intelligence (2019).
[28]
Oscar Koller, Jens Forster, and Hermann Ney. 2015. Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers. Computer Vision and Image Understanding 141 (Dec. 2015), 108--125.
[29]
Oscar Koller, Sepehr Zargaran, Hermann Ney, and Richard Bowden. 2018. Deep sign: Enabling robust statistical continuous sign language recognition via hybrid CNN-HMMs. International Journal of Computer Vision 126, 12 (2018), 1311--1325.
[30]
Okan Köpüklü, Neslihan Kose, Ahmet Gunduz, and Gerhard Rigoll. 2019. Resource efficient 3d convolutional neural networks. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE, 1910--1919.
[31]
Dongxu Li, Cristian Rodriguez, Xin Yu, and Hongdong Li. 2020. Word-level deep sign language recognition from video: A new large-scale dataset and methods comparison. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1459--1469.
[32]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2016. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016).
[33]
Yanqiu Liao, Pengwen Xiong, Weidong Min, Weiqiong Min, and Jiahao Lu. 2019. Dynamic sign language recognition based on video sequence with BLSTM-3D residual networks. IEEE Access 7 (2019), 38044--38054.
[34]
Yongsen Ma, Gang Zhou, Shuangquan Wang, Hongyang Zhao, and Woosub Jung. 2018. Signfi: Sign language recognition using wifi. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 1--21.
[35]
Ilya Makarov, Nikolay Veldyaykin, Maxim Chertkov, and Aleksei Pokoev. 2019. American and russian sign language dactyl recognition. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments. 204--210.
[36]
Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. 2008. OpenFlow: enabling innovation in campus networks. ACM SIGCOMM computer communication review 38, 2 (2008), 69--74.
[37]
Mohamed A Mohandes. 2013. Recognition of two-handed Arabic signs using the CyberGlove. Arabian Journal for Science and Engineering 38, 3 (2013), 669--677.
[38]
Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen Tyree, and Jan Kautz. 2016. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4207--4215.
[39]
Chaithanya Kumar Mummadi, Frederic Philips Peter Leo, Keshav Deep Verma, Shivaji Kasireddy, Philipp Marcel Scholl, and Kristof Van Laerhoven. 2017. Real-time embedded recognition of sign language alphabet fingerspelling in an imu-based glove. In Proceedings of the 4th international Workshop on Sensor-based Activity Recognition and Interaction. 1--6.
[40]
myoarmband [n. d.]. https://support.getmyo.com/hc/en-us
[41]
Deepali Naglot and Milind Kulkarni. 2016. Real time sign language recognition using the leap motion controller. In 2016 International Conference on Inventive Computation Technologies (ICICT), Vol. 3. IEEE, 1--5.
[42]
JaeYeon Park, Hyeon Cho, Rajesh Balan, and JeongGil Ko. 2020. HeartQuake: Accurate Low-Cost Non-Invasive ECG Monitoring Using Bed-Mounted Geophones. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 3 (2020).
[43]
Sunitha Ravi, Maloji Suman, PVV Kishore, Kiran Kumar, Anil Kumar, et al. 2019. Multi modal spatio temporal co-trained CNNs with single modal testing on RGB-D based sign language gesture recognition. Journal of Computer Languages 52 (2019), 88--102.
[44]
Panneer Selvam Santhalingam, Al Amin Hosain, Ding Zhang, Parth Pathak, Huzefa Rangwala, and Raja Kushalnagar. 2020. mmASL: Environment-Independent ASL Gesture Recognition Using 60 GHz Millimeter-wave Signals. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 1 (2020), 1--30.
[45]
Jiacheng Shang and Jie Wu. 2017. A robust sign language recognition system with sparsely labeled instances using Wi-Fi signals. In 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). IEEE, 99--107.
[46]
Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux, Diane Brentari, Greg Shakhnarovich, and Karen Livescu. 2018. American sign language fingerspelling recognition in the wild. In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 145--152.
[47]
Harold P. Stern, Samy A. Mahmoud, and Kin-Kwok Wong. 1996. A comprehensive model for voice activity in conversational speechdevelopment and application to performance analysis of new-generation wireless communication systems. Wireless Networks 2 (1996), 359--367. Issue 4.
[48]
Chao Sun, Tianzhu Zhang, and Changsheng Xu. 2015. Latent support vector machine modeling for sign language recognition with Kinect. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 2 (2015), 1--20.
[49]
Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A survey on deep transfer learning. In International conference on artificial neural networks. Springer, 270--279.
[50]
Ultraleap. [n. d.]. Leap Motion Contorller. Available at https://www.ultraleap.com/product/leap-motion-controller/.
[51]
Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated sign language recognition with grassmann covariance matrices. ACM Transactions on Accessible Computing (TACCESS) 8, 4 (2016), 1--21.
[52]
Seongok Won. 2019. A Study on the Korean Sign Language(KSL) Grammer.
[53]
Jian Wu, Lu Sun, and Roozbeh Jafari. 2016. A wearable system for recognizing American sign language in real-time using IMU and surface EMG sensors. IEEE journal of biomedical and health informatics 20, 5 (2016), 1281--1290.
[54]
Quan Yang. 2010. Chinese sign language recognition based on video sequence appearance modeling. In 2010 5th IEEE Conference on Industrial Electronics and Applications. IEEE, 1537--1542.
[55]
Pujianto Yugopuspito, I Made Murwantara, and Jessica Sean. 2018. Mobile sign language recognition for bahasa indonesia using convolutional neural network. In Proceedings of the 16th International Conference on Advances in Mobile Computing and Multimedia. 84--91.
[56]
Lei Zhang, Yixiang Zhang, and Xiaolong Zheng. 2020. WiSign: Ubiquitous American Sign Language Recognition Using Commercial Wi-Fi Devices. ACM Transactions on Intelligent Systems and Technology (TIST) 11, 3 (2020), 1--24.
[57]
Qian Zhang, Dong Wang, Run Zhao, and Yinggang Yu. 2019. MyoSign: enabling end-to-end sign language recognition with wearables. In Proceedings of the 24th International Conference on Intelligent User Interfaces. 650--660.
[58]
Zhengyou Zhang. 2012. Microsoft Kinect Sensor and Its Effect. IEEE MultiMedia 19 (April 2012), 4--12. https://www.microsoft.com/en-us/research/publication/microsoft-kinect-sensor-and-its-effect/
[59]
Henry Zhong, Salil S Kanhere, and Chun Tung Chou. 2017. VeinDeep: smartphone unlock using vein patterns. In 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 2--10.
[60]
Zijie Zhu, Xuewei Wang, Aakaash Kapoor, Zhichao Zhang, Tingrui Pan, and Zhou Yu. 2018. EIS: A Wearable Device for Epidermal American Sign Language Recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 4 (2018), 1--22.

Cited By

View all
  • (2025)Mind your indices! Index hijacking attacks on collaborative unpooling autoencoder systemsInternet of Things10.1016/j.iot.2024.10146229(101462)Online publication date: Jan-2025
  • (2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 11-Mar-2024
  • (2024)EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM SystemsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670801(1-6)Online publication date: 5-Aug-2024
  • Show More Cited By

Index Terms

  1. Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 5, Issue 2
    June 2021
    932 pages
    EISSN:2474-9567
    DOI:10.1145/3472726
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 June 2021
    Published in IMWUT Volume 5, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Depth image processing
    2. Mobile applications and services
    3. Sign language translation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)157
    • Downloads (Last 6 weeks)23
    Reflects downloads up to 31 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Mind your indices! Index hijacking attacks on collaborative unpooling autoencoder systemsInternet of Things10.1016/j.iot.2024.10146229(101462)Online publication date: Jan-2025
    • (2024)Exploring Sign Language Detection on SmartphonesAdvances in Human-Computer Interaction10.1155/2024/14875002024Online publication date: 11-Mar-2024
    • (2024)EDeN: Enabling Low-Power CNN Inference on Edge Devices Using Prefetcher-assisted NVM SystemsProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3665314.3670801(1-6)Online publication date: 5-Aug-2024
    • (2024)Enhancing the Applicability of Sign Language TranslationIEEE Transactions on Mobile Computing10.1109/TMC.2024.335011123:9(8634-8648)Online publication date: Sep-2024
    • (2024)Multimodal Deep Neural Networks for Robust Sign Language Translation in Real-World Environments2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE)10.1109/ICDCECE60827.2024.10548470(1-6)Online publication date: 26-Apr-2024
    • (2024)A Systematic Review of Human Activity Recognition Based on Mobile Devices: Overview, Progress and TrendsIEEE Communications Surveys & Tutorials10.1109/COMST.2024.335759126:2(890-929)Online publication date: 23-Jan-2024
    • (2024)FedHM: Practical federated learning for heterogeneous model deploymentsICT Express10.1016/j.icte.2023.07.01310:2(387-392)Online publication date: Apr-2024
    • (2023)AttFLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36109177:3(1-31)Online publication date: 27-Sep-2023
    • (2023)Mozart: A Mobile ToF System for Sensing in the Dark through Phase ManipulationProceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services10.1145/3581791.3596840(163-176)Online publication date: 18-Jun-2023
    • (2023)The Sight for Hearing: An IoT-Based System to Assist Drivers with Hearing Disability2023 IEEE Symposium on Computers and Communications (ISCC)10.1109/ISCC58397.2023.10218250(1305-1310)Online publication date: 9-Jul-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media