[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3379299.3379301acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdmipConference Proceedingsconference-collections
research-article

Ultrasound Tongue Image Classification using Transfer Learning

Published: 20 March 2020 Publication History

Abstract

The ultrasound image of the tongue consists of high-level speckle noise, and efficient approach to interpret the image sequences is desired. Automatic ultrasound tongue image classification is of great interest for the clinical linguists, as hand labeling is costly. In this paper, we explore the classification of midsagittal tongue gestures by employing transfer- learning, which can be effective with limited labeled data size. Within the transfer-learning framework, four state- of-the-art convolutional neural network (CNN) architectures are used to make a quantitatively comparison. Classification experiments are conducted using the data from two females. Based on the experimental results, we observed that the learned knowledge from one subject can be transferred to improve the classification accuracy of another subject.

References

[1]
B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg. 2010. Silent speech interfaces. Speech Communication. 52, 4, 270--287. DOI= 10.1016/j.specom.2009.08.002.
[2]
Y. Ji, L. Liu, H. Wang, Z. Liu, Z. Niu, and B. Denby. 2018. Updating the silent speech challenge benchmark with deep learning. Speech Communication. 98, 42--50. DOI= 10.1016/j.specom.2018.02.002.
[3]
T. Hueber, G. Aversano, G. Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, and M. Stone. 2007. Eigentongue feature extraction for an ultrasound- based silent speech interface. In Proceedings of IEEE International Conference on Acoustics. IEEE. Honolulu, HI, USA, 1245--1248. DOI=10.1109/ICASSP.2007.366140.
[4]
T. Bressmann. 2008. Quantitative assessment of tongue shape and movement using ultrasound imaging. In Proceedings of the 3rd Conference on Laboratory Approaches to Spanish Phonology. 101--106.
[5]
T. Hueber, E.-L. Benaroya, G. Chollet, G. Dreyfus, and M. Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication. 52, 4, 288--300. DOI= 10.1016/j.specom.2009.11.004.
[6]
J. Wang, A. Samal, J. R. Green, and F. Rudzicz. 2012. Sentence recognition from articulatory movements for silent speech interfaces. In Proceedings of IEEE International Conference on Acoustics IEEE. 4985--4988. DOI= 10.1109/ICASSP.2012.6289039.
[7]
J. Cai, B. Denby, P. Roussel-Ragot, G. Dreyfus, and L. Crevier-Buchman. 2011. Recognition and real time performances of a lightweight ultrasound based silent speech interface employing a language model. In Proceedings of 12th Annual Conference of the International Speech Communication Association. 1005--1008.
[8]
J. Wang, A. Samal, J. R. Green, and F. Rudzicz. 2012. Whole word recognition from articulatory movements for silent speech interfaces. In Proceedings of Interspeech. 1327--1330.
[9]
K. Xu, P. Roussel, T. G. Csapó, and B. Denby. 2017. Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images. The Journal of the Acoustical Society of America. 141, 6. EL531--EL537. DOI=10.1121/1.4984122.
[10]
M. Li, C. Kambhamettu, and M. Stone. 2005. Automatic contour tracking in ultrasound images. Clinical linguistics & phonetics. 19, 6-7. 545--554. DOI=10.1080/02699200500113616.
[11]
J. Berry, D. Archangeli, and I. Fasel. 2010. Automatic classification of tongue gestures in ultrasound images. in Laboratory Phonology.
[12]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processin systems, 1097--1105.BDOI= 10.1145/3065386.
[13]
E. Tatulli and T. Hueber. 2017. Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Proceedings of IEEE International Conference on Acoustics IEEE. 2971--2975. DOI= 10.1109/ICASSP.2017.7952701.
[14]
M.-Y. Hwang, G. Peng, W. Wang, A. Faria, A. Heidel, and M. Ostendorf. 2007. Building a highly accurate mandarin speech recognizer. in Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE. 490--495. DOI= 10.1109/ASRU.2007.4430161.
[15]
K. Xu, T. Gábor Csapó, P. Roussel, and B. Denby. 2016. A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic reinitialization. The Journal of the Acoustical Society of America. 139, 5, EL154--EL160. DOI= 10.1121/1.4951024.
[16]
S. J. Pan, Q. Yang. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering. 22, 10, 1345--1359. DOI=10.1109/TKDE.2009.191.
[17]
K. Xu, Y. Yang, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, M. Stone, and B. Denby. 2015. Development of a 3D tongue motion visualization platform based on ultrasound image sequences. In Proccedings of 18th International Congress of Phonetic Sciences. arXiv=1605.06106.
[18]
K. Xu, Y. Yang, A. Jaumard-Hakoun, M. Adda-Decker, A. Amelot, S. K. A. Kork, L.Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, and B. Denby. 2014. 3D tongue motion visualization based on ultrasound image sequences. In Proceedings of Interspeech. 1482--1483. DOI=10.5281/zendo.16088.
[19]
K. Simonyan and A. Zisserman, 2014, Very deep convolutional networks for large-scale image recognition. In Proceedings of International Conference on Learning Representations. arXiv=1409.1556v6.
[20]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9. DOI= 10.1109/CVPR.2015.7298594.
[21]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. DOI= 10.1109/CVPR.2016.90.
[22]
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. 2017. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. arXiv=1608.06993v5.
[23]
L. v. d. Maaten and G. Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research. 9, 2065, 2579--2605. DOI= 10.1007/s10846-008-9235-4.

Cited By

View all
  • (2023)TOFI: Designing Intraoral Computer Interfaces for Gamified Myofunctional TherapyExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3573848(1-8)Online publication date: 19-Apr-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
DMIP '19: Proceedings of the 2019 2nd International Conference on Digital Medicine and Image Processing
November 2019
59 pages
ISBN:9781450376983
DOI:10.1145/3379299
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • East China Normal University
  • University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. B-mode ultrasound imaging
  2. Onvolutional neural network
  3. Tongue image classification
  4. Transfer learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

DMIP '19

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)TOFI: Designing Intraoral Computer Interfaces for Gamified Myofunctional TherapyExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3573848(1-8)Online publication date: 19-Apr-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media