More Web Proxy on the site http://driver.im/

research-article

Ultrasound Tongue Image Classification using Transfer Learning

Authors:

Xianglin WangAuthors Info & Claims

DMIP '19: Proceedings of the 2019 2nd International Conference on Digital Medicine and Image Processing

Pages 38 - 42

https://doi.org/10.1145/3379299.3379301

Published: 20 March 2020 Publication History

Abstract

The ultrasound image of the tongue consists of high-level speckle noise, and efficient approach to interpret the image sequences is desired. Automatic ultrasound tongue image classification is of great interest for the clinical linguists, as hand labeling is costly. In this paper, we explore the classification of midsagittal tongue gestures by employing transfer- learning, which can be effective with limited labeled data size. Within the transfer-learning framework, four state- of-the-art convolutional neural network (CNN) architectures are used to make a quantitatively comparison. Classification experiments are conducted using the data from two females. Based on the experimental results, we observed that the learned knowledge from one subject can be transferred to improve the classification accuracy of another subject.

References

[1]

B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg. 2010. Silent speech interfaces. Speech Communication. 52, 4, 270--287. DOI= 10.1016/j.specom.2009.08.002.

Digital Library

[2]

Y. Ji, L. Liu, H. Wang, Z. Liu, Z. Niu, and B. Denby. 2018. Updating the silent speech challenge benchmark with deep learning. Speech Communication. 98, 42--50. DOI= 10.1016/j.specom.2018.02.002.

Digital Library

[3]

T. Hueber, G. Aversano, G. Chollet, B. Denby, G. Dreyfus, Y. Oussar, P. Roussel, and M. Stone. 2007. Eigentongue feature extraction for an ultrasound- based silent speech interface. In Proceedings of IEEE International Conference on Acoustics. IEEE. Honolulu, HI, USA, 1245--1248. DOI=10.1109/ICASSP.2007.366140.

[4]

T. Bressmann. 2008. Quantitative assessment of tongue shape and movement using ultrasound imaging. In Proceedings of the 3rd Conference on Laboratory Approaches to Spanish Phonology. 101--106.

[5]

T. Hueber, E.-L. Benaroya, G. Chollet, G. Dreyfus, and M. Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. Speech Communication. 52, 4, 288--300. DOI= 10.1016/j.specom.2009.11.004.

Digital Library

[6]

J. Wang, A. Samal, J. R. Green, and F. Rudzicz. 2012. Sentence recognition from articulatory movements for silent speech interfaces. In Proceedings of IEEE International Conference on Acoustics IEEE. 4985--4988. DOI= 10.1109/ICASSP.2012.6289039.

[7]

J. Cai, B. Denby, P. Roussel-Ragot, G. Dreyfus, and L. Crevier-Buchman. 2011. Recognition and real time performances of a lightweight ultrasound based silent speech interface employing a language model. In Proceedings of 12th Annual Conference of the International Speech Communication Association. 1005--1008.

[8]

J. Wang, A. Samal, J. R. Green, and F. Rudzicz. 2012. Whole word recognition from articulatory movements for silent speech interfaces. In Proceedings of Interspeech. 1327--1330.

[9]

K. Xu, P. Roussel, T. G. Csapó, and B. Denby. 2017. Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images. The Journal of the Acoustical Society of America. 141, 6. EL531--EL537. DOI=10.1121/1.4984122.

[10]

M. Li, C. Kambhamettu, and M. Stone. 2005. Automatic contour tracking in ultrasound images. Clinical linguistics & phonetics. 19, 6-7. 545--554. DOI=10.1080/02699200500113616.

[11]

J. Berry, D. Archangeli, and I. Fasel. 2010. Automatic classification of tongue gestures in ultrasound images. in Laboratory Phonology.

[12]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processin systems, 1097--1105.BDOI= 10.1145/3065386.

[13]

E. Tatulli and T. Hueber. 2017. Feature extraction using multimodal convolutional neural networks for visual speech recognition. In Proceedings of IEEE International Conference on Acoustics IEEE. 2971--2975. DOI= 10.1109/ICASSP.2017.7952701.

[14]

M.-Y. Hwang, G. Peng, W. Wang, A. Faria, A. Heidel, and M. Ostendorf. 2007. Building a highly accurate mandarin speech recognizer. in Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on. IEEE. 490--495. DOI= 10.1109/ASRU.2007.4430161.

[15]

K. Xu, T. Gábor Csapó, P. Roussel, and B. Denby. 2016. A comparative study on the contour tracking algorithms in ultrasound tongue images with automatic reinitialization. The Journal of the Acoustical Society of America. 139, 5, EL154--EL160. DOI= 10.1121/1.4951024.

[16]

S. J. Pan, Q. Yang. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering. 22, 10, 1345--1359. DOI=10.1109/TKDE.2009.191.

Digital Library

[17]

K. Xu, Y. Yang, A. Jaumard-Hakoun, C. Leboullenger, G. Dreyfus, P. Roussel, M. Stone, and B. Denby. 2015. Development of a 3D tongue motion visualization platform based on ultrasound image sequences. In Proccedings of 18th International Congress of Phonetic Sciences. arXiv=1605.06106.

[18]

K. Xu, Y. Yang, A. Jaumard-Hakoun, M. Adda-Decker, A. Amelot, S. K. A. Kork, L.Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, and B. Denby. 2014. 3D tongue motion visualization based on ultrasound image sequences. In Proceedings of Interspeech. 1482--1483. DOI=10.5281/zendo.16088.

[19]

K. Simonyan and A. Zisserman, 2014, Very deep convolutional networks for large-scale image recognition. In Proceedings of International Conference on Learning Representations. arXiv=1409.1556v6.

[20]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9. DOI= 10.1109/CVPR.2015.7298594.

[21]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. DOI= 10.1109/CVPR.2016.90.

[22]

G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. 2017. Densely connected convolutional networks. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. arXiv=1608.06993v5.

[23]

L. v. d. Maaten and G. Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research. 9, 2065, 2579--2605. DOI= 10.1007/s10846-008-9235-4.

Cited By

Franzke LGartmann JBachmann DTöpper TFraninović K(2023)TOFI: Designing Intraoral Computer Interfaces for Gamified Myofunctional TherapyExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3573848(1-8)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3573848

Index Terms

Ultrasound Tongue Image Classification using Transfer Learning
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Breast Ultrasound Image Analysis based on Transfer Learning
CSAI '20: Proceedings of the 2020 4th International Conference on Computer Science and Artificial Intelligence

Breast cancer is the most common cancer in women, and ultrasound diagnosis is one of the most common diagnostic methods. This paper adopted the transfer learning method to classify benign and malignant breast tumours by ultrasound images and predict ...
Image clustering segmentation based on SLIC superpixel and transfer learning

Traditional fuzzy C-means clustering algorithm has poor noise immunity and clustering results in image segmentation. To overcome this problem, a novel image clustering algorithm based on SLIC superpixel and transfer learning is proposed in this paper. ...
CT Image Enhancement Using Stacked Generative Adversarial Networks and Transfer Learning for Lesion Segmentation Improvement
Machine Learning in Medical Imaging
Abstract
Automated lesion segmentation from computed tomography (CT) is an important and challenging task in medical image analysis. While many advancements have been made, there is room for continued improvements. One hurdle is that CT images can exhibit ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

DMIP '19: Proceedings of the 2019 2nd International Conference on Digital Medicine and Image Processing

November 2019

59 pages

ISBN:9781450376983

DOI:10.1145/3379299

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

East China Normal University
University of Tsukuba: University of Tsukuba

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

DMIP '19

DMIP '19: 2019 2nd International Conference on Digital Medicine and Image Processing

November 13 - 15, 2019

Shanghai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
134
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Franzke LGartmann JBachmann DTöpper TFraninović K(2023)TOFI: Designing Intraoral Computer Interfaces for Gamified Myofunctional TherapyExtended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544549.3573848(1-8)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544549.3573848

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents