poster

Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

Authors:

Laxmi Pandey,

Ahmed Sabbir ArifAuthors Info & Claims

SIGGRAPH '21: ACM SIGGRAPH 2021 Posters

Article No.: 27, Pages 1 - 2

https://doi.org/10.1145/3450618.3469176

Published: 06 August 2021 Publication History

Get Access

Abstract

We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract shaping during speech production, captured by real-time magnetic resonance imaging (rtMRI), and translate it into text. In an experiment, it achieved a 40.6% PER at sentence-level, much better compared to the existing models. We also performed an analysis of variations in the geometry of articulation in each sub-regions of the vocal tract with respect to different emotions and genders. Results suggest that each sub-regions distortion is affected by both emotion and gender.

Supplementary Material

VTT File (3450618.3469176.vtt)

Download
6.38 KB

a27-pandey-supplement (a27-pandey-poster.pdf)

Poster

Download
1.58 MB

MP4 File (3450618.3469176.mp4)

Presentation.

Download
97.07 MB

References

[1]

Jangwon Kim and et al.2014. USC-EMO-MRI corpus: An Emotional Speech Production Database Recorded by Real-time Magnetic Resonance Imaging.

Google Scholar

[2]

Shrikanth Narayanan and et al.2014. Real-time Magnetic Resonance Imaging and Electromagnetic Articulography Database for Speech Production Research (TC). 136 (2014), 1307.

Google Scholar

[3]

Pramit Saha and et al.2018. Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI. 1249–1253.

Google Scholar

[4]

Kicky van Leeuwen and et al.2019. CNN-Based Phoneme Classifier from Vocal Tract MRI Learns Embedding Consistent with Articulatory Topology. 909–913.

Google Scholar

Cited By

View all

van Rensburg EBotha RHaskins B(2024)Research Agenda for Speaker AuthenticationHuman Aspects of Information Security and Assurance10.1007/978-3-031-72559-3_19(278-291)Online publication date: 28-Nov-2024
https://doi.org/10.1007/978-3-031-72559-3_19
Belyk MCarignan CMcGettigan C(2023)An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance imagesBehavior Research Methods10.3758/s13428-023-02171-956:3(2623-2635)Online publication date: 28-Jul-2023
https://doi.org/10.3758/s13428-023-02171-9
Zhang RLi KHao YWang YLai ZGuimbretière FZhang C(2023)EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic SensingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580801(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580801
Show More Cited By

Recommendations

Effects of Speaking Rate on Speech and Silent Speech Recognition
CHI EA '22: Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems

Speaking rate or the speed at which a person speaks is a fundamental user characteristic. This work investigates the rate in which users speak when interacting with speech and silent speech-based methods. Results revealed that native users speak about ...
Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging
Abstract Background and Objectives
The characterization of the vocal tract geometry during speech interests various research topics, including speech production modeling, motor control analysis, and speech therapy design. Real-time MRI is a reliable and ...
Highlights
- Describing the vocal tract geometry during speech interests various research topics.
- A convolutional neural network segments non-rigid vocal tract articulators in speech RT-MRI recordings of multiple speakers.
- The method results in ...
Measuring Variations of Voice Source and Vocal Tract Characteristics from Korean Emotional Voice
ISDA '06: Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 02

We explored the voice source and vocal tract characteristics of emotional speech to estimate the voice quality. Emotional speech data used was collected from the actors. Speech materials consist of 10 sentences from 3 male and 3 female speakers in 6 ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGGRAPH '21: ACM SIGGRAPH 2021 Posters

August 2021

90 pages

ISBN:9781450383714

DOI:10.1145/3450618

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2021

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

SIGGRAPH '21

Sponsor:

SIGGRAPH

SIGGRAPH '21: Special Interest Group on Computer Graphics and Interactive Techniques Conference

August 9 - 13, 2021

Virtual Event, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
148
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)3

Reflects downloads up to 12 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

van Rensburg EBotha RHaskins B(2024)Research Agenda for Speaker AuthenticationHuman Aspects of Information Security and Assurance10.1007/978-3-031-72559-3_19(278-291)Online publication date: 28-Nov-2024
https://doi.org/10.1007/978-3-031-72559-3_19
Belyk MCarignan CMcGettigan C(2023)An open-source toolbox for measuring vocal tract shape from real-time magnetic resonance imagesBehavior Research Methods10.3758/s13428-023-02171-956:3(2623-2635)Online publication date: 28-Jul-2023
https://doi.org/10.3758/s13428-023-02171-9
Zhang RLi KHao YWang YLai ZGuimbretière FZhang C(2023)EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic SensingProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580801(1-18)Online publication date: 19-Apr-2023
https://dl.acm.org/doi/10.1145/3544548.3580801
Pandey LArif A(2022)Design and Evaluation of a Silent Speech-Based Selection Method for Eye-Gaze PointingProceedings of the ACM on Human-Computer Interaction10.1145/35677236:ISS(328-353)Online publication date: 14-Nov-2022
https://dl.acm.org/doi/10.1145/3567723

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

Supplementary Material

References

Cited By

Recommendations

Effects of Speaking Rate on Speech and Silent Speech Recognition

Automatic segmentation of vocal tract articulators in real-time magnetic resonance imaging

Measuring Variations of Voice Source and Vocal Tract Characteristics from Korean Emotional Voice

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations