UltrasonicGS: A Highly Robust Gesture and Sign Language Recognition Method Based on Ultrasonic Signals
<p>Overview of UltrasonicGS (In the output result module, “我是教师” is a Chinese sentence, which means “I am a teacher” in English. Where “我”“是”“教师” correspond to “I”, “am” and “teacher” respectively).</p> "> Figure 2
<p>Single gesture spectrogram.</p> "> Figure 3
<p>Continuous gesture spectrogram.</p> "> Figure 4
<p>Sign language gesture spectrogram.</p> "> Figure 5
<p>Single gesture action data processing process. (<b>a</b>) Bandpass filtering data; (<b>b</b>) Gaussian smoothing data.</p> "> Figure 6
<p>Overview of the GAN.</p> "> Figure 7
<p>Overview of the ResNet34 and Bi-LSTM.</p> "> Figure 8
<p>Experimental equipment and environment. (<b>a</b>) Data collection equipment; (<b>b</b>) Laboratory environment; (<b>c</b>) Corridor environment.</p> "> Figure 9
<p>Impact of different distance, angles and speeds. (<b>a</b>) Impact of different distance; (<b>b</b>) impact of different angles; and (<b>c</b>) impact of different speeds.</p> "> Figure 10
<p>Impact of different environments and interference states. (<b>a</b>) Impact of different environments and (<b>b</b>) impact of different interference states.</p> "> Figure 11
<p>Impact on recognition performance of single gesture, continuous gesture and sign language gesture when data augmentation is used or not. (<b>a</b>) Single gesture; (<b>b</b>) continuous gesture; and (<b>c</b>) sign language gesture.</p> "> Figure 12
<p>Overall performance of single gestures.</p> "> Figure 13
<p>Impact of classification model on continuous gesture performance. (<b>a</b>) Spread and Pinch; (<b>b</b>) Push and Pull; (<b>c</b>) Hover and OK; (<b>d</b>) Around Left and Around Right; (<b>e</b>) One, Two, and Three; and (<b>f</b>) Slide Up, Slide Down, Slide Left, and Slide Right.</p> "> Figure 14
<p>Impact of different models on accuracy of continuous gestures.</p> "> Figure 15
<p>Impact of classification model on sign language gesture performance. (<b>a</b>) I am a teacher. (<b>b</b>) I am fine, thanks. (<b>c</b>) What day is today? (<b>d</b>) Sorry, I am late. (<b>e</b>) What do you do? (<b>f</b>) What is your name?</p> "> Figure 16
<p>Impact of different models on accuracy of sign language gestures.</p> ">
Abstract
:1. Introduction
- 1.
- We propose a data augmentation method based on GAN. Due to the randomness of GAN itself, it makes the generated samples more diverse and can cover more real situations, while it can reduce the classification model error and improve the performance of the model.
- 2.
- We feed the multi-scale semantic features extracted by the residual neural network into the Bi-LSTM algorithm. The algorithm enables the classification network to fuse the information of feature dimension and temporal dimension to achieve high-precision gesture recognition. Meanwhile, in order to fill the gap of acoustic perception recognition of continuous gestures and Chinese sign language gestures and solve the problem of inconsistent length and difficult alignment of continuous gesture and sign language gesture input and output sequences, we add the CTC algorithm after the Bi-LSTM network. It enables the model to achieve good results for continuous gesture recognition and sign-language-recognition problems as well.
- 3.
- In this paper, we obtain real data on gestures from multiple groups of volunteers and form an open-source database. Through two real scene tests, it is verified that the proposed method has high robustness, the accuracy of single gesture recognition reaches 98.8%, and the recognition distance is 0.5 m. At the same time, the sign language data collected can provide data support for education professionals to study the daily interaction behavior of certain groups, such as the deaf.
2. Related Work
3. System Design
3.1. Overview
3.2. Data Collection and Pre-Processing
3.3. Data Augmentation
3.4. Feature Extraction and Gesture Classification
3.4.1. Feature Extraction
3.4.2. Gesture Classification
Algorithm 1 Steps of CTC | |
Input: Sequence of strings L, Number of nodes in each expansion W | |
Output: The sequence Q with the maximum probability at time T | |
1: | for to T do |
2: | Set = the W most probable sequences in B (L when ) |
3: | Set B={ } |
4: | for do |
5: | if then |
6: | |
7: | if then |
8: | |
9: | |
10: | add p to B |
11: | for to K do |
12: | |
13: | |
14: | add to B |
15: | return |
4. Experimentation and Evaluation
4.1. Experiment Setting
4.2. Ablation Study
4.2.1. Impact of Different Influencing Factors
4.2.2. Impact of Noise and Personnel Interference
4.2.3. Impact of Dataset Size
4.3. Comparison with the State-Of-The-Art Methods
4.4. Overall Performance
4.4.1. Overall Accuracy of Single Gestures
4.4.2. Performance Evaluation of Continuous Gesture
4.4.3. Performance Evaluation of Sign Language Gesture
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Considerations for Quarantine of Contacts of COVID-19 Cases: Interim Guidance, 25 June 2021; Technical Report; World Health Organization: Geneva, Switzerland, 2021.
- Savoie, P.; Cameron, J.A.; Kaye, M.E.; Scheme, E.J. Automation of the timed-up-and-go test using a conventional video camera. IEEE J. Biomed. Health Inform. 2019, 24, 1196–1205. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Ma, J.; Li, X.; Zhong, A. Hierarchical multi-classification for sensor-based badminton activity recognition. In Proceedings of the 2020 15th IEEE International Conference on Signal Processing (ICSP), Beijing, China, 6–9 December 2020; Volume 1, pp. 371–375. [Google Scholar]
- Li, J.; Yin, K.; Tang, C. SlideAugment: A Simple Data Processing Method to Enhance Human Activity Recognition Accuracy Based on WiFi. Sensors 2021, 21, 2181. [Google Scholar] [CrossRef]
- Zhou, S.; Zhang, W.; Peng, D.; Liu, Y.; Liao, X.; Jiang, H. Adversarial WiFi sensing for privacy preservation of human behaviors. IEEE Commun. Lett. 2019, 24, 259–263. [Google Scholar] [CrossRef]
- Wang, W.; Li, J.; He, Y.; Guo, X.; Liu, Y. MotorBeat: Acoustic Communication for Home Appliances via Variable Pulse Width Modulation. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–24. [Google Scholar] [CrossRef]
- Zhuang, Y.; Wang, Y.; Yan, Y.; Xu, X.; Shi, Y. ReflecTrack: Enabling 3D Acoustic Position Tracking Using Commodity Dual-Microphone Smartphones. In Proceedings of the 34th Annual ACM Symposium on User Interface Software and Technology, Virtual, 10–14 October 2021; pp. 1050–1062. [Google Scholar]
- Xu, X.; Gong, J.; Brum, C.; Liang, L.; Suh, B.; Gupta, S.K.; Agarwal, Y.; Lindsey, L.; Kang, R.; Shahsavari, B.; et al. Enabling hand gesture customization on wrist-worn devices. In Proceedings of the CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April–5 May 2022; pp. 1–19. [Google Scholar]
- Xu, X.; Shi, H.; Yi, X.; Liu, W.; Yan, Y.; Shi, Y.; Mariakakis, A.; Mankoff, J.; Dey, A.K. Earbuddy: Enabling on-face interaction via wireless earbuds. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 25–30 April 2020; pp. 1–14. [Google Scholar]
- Gao, Y.; Jin, Y.; Li, J.; Choi, S.; Jin, Z. EchoWhisper: Exploring an Acoustic-based Silent Speech Interface for Smartphone Users. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–27. [Google Scholar] [CrossRef]
- Wang, W.; Liu, A.X.; Sun, K. Device-free gesture tracking using acoustic signals. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, New York, NY, USA, 3–7 October 2016; pp. 82–94. [Google Scholar]
- Yun, S.; Chen, Y.C.; Zheng, H.; Qiu, L.; Mao, W. Strata: Fine-grained acoustic-based device-free tracking. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 15–28. [Google Scholar]
- Wang, P.; Jiang, R.; Liu, C. Amaging: Acoustic Hand Imaging for Self-adaptive Gesture Recognition. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, London, UK, 2–5 May 2022; pp. 80–89. [Google Scholar]
- Hao, Z.; Duan, Y.; Dang, X.; Liu, Y.; Zhang, D. Wi-SL: Contactless fine-grained gesture recognition uses channel state information. Sensors 2020, 20, 4025. [Google Scholar] [CrossRef] [PubMed]
- Nguyen-Trong, K.; Vu, H.N.; Trung, N.N.; Pham, C. Gesture recognition using wearable sensors with bi-long short-term memory convolutional neural networks. IEEE Sens. J. 2021, 21, 15065–15079. [Google Scholar] [CrossRef]
- Rinalduzzi, M.; De Angelis, A.; Santoni, F.; Buchicchio, E.; Moschitta, A.; Carbone, P.; Bellitti, P.; Serpelloni, M. Gesture Recognition of Sign Language Alphabet Using a Magnetic Positioning System. Appl. Sci. 2021, 11, 5594. [Google Scholar] [CrossRef]
- Hou, J.; Li, X.Y.; Zhu, P.; Wang, Z.; Wang, Y.; Qian, J.; Yang, P. Signspeaker: A real-time, high-precision smartwatch-based sign language translator. In Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, Los Cabos, Mexico, 21–25 October 2019; pp. 1–15. [Google Scholar]
- Liu, Z.; Pan, C.; Wang, H. Continuous Gesture Sequences Recognition Based on Few-Shot Learning. Int. J. Aerosp. Eng. 2022, 2022, 7868142. [Google Scholar] [CrossRef]
- Mahmoud, R.; Belgacem, S.; Omri, M.N. Towards an end-to-end isolated and continuous deep gesture recognition process. Neural Comput. Appl. 2022, 34, 13713–13732. [Google Scholar] [CrossRef]
- Guo, D.; Zhou, W.; Li, H.; Wang, M. Hierarchical lstm for sign language translation. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Tang, S.; Guo, D.; Hong, R.; Wang, M. Graph-based multimodal sequential embedding for sign language translation. IEEE Trans. Multimed. 2021, 24, 4433–4445. [Google Scholar] [CrossRef]
- Tang, S.; Hong, R.; Guo, D.; Wang, M. Gloss Semantic-Enhanced Network with Online Back-Translation for Sign Language Production. In Proceedings of the 30th ACM International Conference on Multimedia, New York, NY, USA, 10–14 October 2022; pp. 5630–5638. [Google Scholar]
- Mao, W.; He, J.; Qiu, L. Cat: High-precision acoustic motion tracking. In Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking, New York, NY, USA, 3–7 October 2016; pp. 69–81. [Google Scholar]
- Wang, Y.; Shen, J.; Zheng, Y. Push the limit of acoustic gesture recognition. IEEE Trans. Mob. Comput. 2020, 21, 1798–1811. [Google Scholar] [CrossRef]
- Nandakumar, R.; Iyer, V.; Tan, D.; Gollakota, S. Fingerio: Using active sonar for fine-grained finger tracking. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016; pp. 1515–1525. [Google Scholar]
- Jin, Y.; Gao, Y.; Zhu, Y.; Wang, W.; Li, J.; Choi, S.; Li, Z.; Chauhan, J.; Dey, A.K.; Jin, Z. Sonicasl: An acoustic-based sign language gesture recognizer using earphones. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 1–30. [Google Scholar] [CrossRef]
- Basner, M.; Babisch, W.; Davis, A.; Brink, M.; Clark, C.; Janssen, S.; Stansfeld, S. Auditory and non-auditory effects of noise on health. Lancet 2014, 383, 1325–1332. [Google Scholar] [CrossRef] [PubMed]
- Cai, C.; Pu, H.; Hu, M.; Zheng, R.; Luo, J. Acoustic software defined platform: A versatile sensing and general benchmarking platform. IEEE Trans. Mob. Comput. 2021, 22, 647–660. [Google Scholar] [CrossRef]
- Perez, L.; Wang, J. The effectiveness of data augmentation in image classification using deep learning. arXiv 2017, arXiv:1712.04621. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kawakami, K. Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. Thesis, Technical University of Munich, Munich, Germany, 2008. [Google Scholar]
- Graves, A.; Fernández, S.; Gomez, F.; Schmidhuber, J. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 369–376. [Google Scholar]
- Ruan, W.; Sheng, Q.Z.; Yang, L.; Gu, T.; Xu, P.; Shangguan, L. AudioGest: Enabling fine-grained hand gesture detection by decoding echo signal. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, Heidelberg, Germany, 12–16 September 2016; pp. 474–485. [Google Scholar]
- Gupta, S.; Morris, D.; Patel, S.; Tan, D. Soundwave: Using the doppler effect to sense gestures. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Austin, TX, USA, 5–10 May 2012; pp. 1911–1914. [Google Scholar]
- Ling, K.; Dai, H.; Liu, Y.; Liu, A.X.; Wang, W.; Gu, Q. Ultragesture: Fine-grained gesture sensing and recognition. IEEE Trans. Mob. Comput. 2020, 21, 2620–2636. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Project | Signal | Device Free | Application | Algorithm | Feature | Accuracy |
---|---|---|---|---|---|---|
AudioGest [34] | Ultrasound | Yes | Whole-hand Gesture | / | Doppler Effect | 89.1% |
SoundWave [35] | Ultrasound | Yes | Whole-hand Gesture | CNN | Doppler Effect | 88.6% |
UltraGesture [36] | Ultrasound | Yes | Finger-level Gesture | CNN | CIR | 93.5% |
Push [24] | Ultrasound | Yes | Finger-level Gesture | CNN+LSTM | CIR | 95.3% |
Ours | Ultrasound | Yes | Finger-level Gesture | CNN+Bi-LSTM | Doppler Effect | 98.8% |
Project | Signal | Application | Algorithm | Single | Continuous | Sign Language |
---|---|---|---|---|---|---|
SonicASL [26] | Ultrasound | Word and Sentence | CNN+LSTM+CTC | 93.8% | / | 90.6% |
Ours | Ultrasound | Word and Sentence | CNN+Bi-LSTM+CTC | 98.8% | 92.4% | 86.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Y.; Hao, Z.; Dang, X.; Zhang, Z.; Li, M. UltrasonicGS: A Highly Robust Gesture and Sign Language Recognition Method Based on Ultrasonic Signals. Sensors 2023, 23, 1790. https://doi.org/10.3390/s23041790
Wang Y, Hao Z, Dang X, Zhang Z, Li M. UltrasonicGS: A Highly Robust Gesture and Sign Language Recognition Method Based on Ultrasonic Signals. Sensors. 2023; 23(4):1790. https://doi.org/10.3390/s23041790
Chicago/Turabian StyleWang, Yuejiao, Zhanjun Hao, Xiaochao Dang, Zhenyi Zhang, and Mengqiao Li. 2023. "UltrasonicGS: A Highly Robust Gesture and Sign Language Recognition Method Based on Ultrasonic Signals" Sensors 23, no. 4: 1790. https://doi.org/10.3390/s23041790
APA StyleWang, Y., Hao, Z., Dang, X., Zhang, Z., & Li, M. (2023). UltrasonicGS: A Highly Robust Gesture and Sign Language Recognition Method Based on Ultrasonic Signals. Sensors, 23(4), 1790. https://doi.org/10.3390/s23041790