[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2049536.2049574acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article

On the intelligibility of fast synthesized speech for individuals with early-onset blindness

Published: 24 October 2011 Publication History

Abstract

People with visual disabilities increasingly use text-to-speech synthesis as a primary output modality for interaction with computers. Surprisingly, there have been no systematic comparisons of the performance of different text-to-speech systems for this user population. In this paper we report the results of a pilot experiment on the intelligibility of fast synthesized speech for individuals with early-onset blindness. Using an open-response recall task, we collected data on four synthesis systems representing two major approaches to text-to-speech synthesis: formant-based synthesis and concatenative unit selection synthesis. We found a significant effect of speaking rate on intelligibility of synthesized speech, and a trend towards significance for synthesizer type. In post-hoc analyses, we found that participant-related factors, including age and familiarity with a synthesizer and voice, also affect intelligibility of fast synthesized speech.

References

[1]
Apache commons codec. http://commons.apache.org/codec/apidocs/overview-summary.html.
[2]
C. Asawka, H. Takagi, S. Ino, and T. Ifukube. Maximum listening speeds for the blind. In Proceedings of the International Conference on Auditory Display, 2003.
[3]
M. Beutnagel et al. The AT&T next-gen TTS system. In Proceedings of the Joint Meeting of the ASA, EAA and DAGA, 1999.
[4]
H. T. Bunell and J. Lilley. Analysis methods for assessing TTS intelligibility. Presented at the 6th ISCA Workshop on Speech Synthesis, 2007.
[5]
S. Chapman. Simmetrics. http://staffwww.dcs.shef.ac.uk/people/S.Chapman/ simmetrics.html.
[6]
eSpeak. http://espeak.sourceforge.net/.
[7]
F. Gougoux et al. Neuropsychology: Pitch discrimination in the early blind. Nature, 430, 2004.
[8]
W. Hallahan. DECtalk software: Text-to-speech technology and implementation. Digital Technical Journal, 7(4), 1995.
[9]
S. Hertz, R. Younes, and N. Zinovieva. Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system. In Proceedings of the 14th International Congress of Phonetic Sciences, 1999.
[10]
IVONA. http://www.ivona.com/.
[11]
E. Janse. Word perception in fast speech: artificially time-compressed vs. naturally produced fast speech. Speech Communication, 42:155--173, 2004.
[12]
E. Janse, M. van der Werff, and H. Quené. Listening to fast speech: aging and sentence context. In Proceedings of the 16th International Congress of Phonetic Sciences, 2007.
[13]
D. Klatt and L. Klatt. Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87(2):820--857, 1990.
[14]
J. Lebeter and S. Saunders. The effects of time compression on the comprehension of natural and synthetic speech. Working Papers of the Linguistics Circle of the University of Victoria, 20:63--81, 2010.
[15]
D. Moers, P. Wagner, B. Möbius, F. Müllers, and I. Jauk. Integrating a fast speech corpus in unit selection synthesis: experiments on perception, segmentation, and duration prediction. In Proceedings of Speech Prosody, 2010.
[16]
A. Moos and J. Trouvain. Comprehension of ultra-fast speech - blind vs. "normally hearing" persons. In Proceedings of the 16th International Congress of Phonetic Sciences, 2007.
[17]
T. Nishimoto et al. Effect of learning on listening to ultra-fast synthesized speech. In Proceedings of the IEEE Engineering in Medicine and Biology Conference, 2006.
[18]
K. Papadopoulos, E. Katemidou, A. Koutsoklenis, and E. Mouratidou. Differences among sighted individuals and individuals with visual impairments in word intelligibility presented via synthetic and natural speech. Augmentative and Alternative Communication, 26(4):278--288, 2010.
[19]
L. Phillips. The double metaphone search algorithm. C/C++ Users Journal, 18(6):38--43, June 2000.
[20]
B. Sutton, J. King, K. Hux, and D. Beukelman. Younger and older adults' rate performance when listening to synthetic speech. Augmentative and Alternative Communication, 11(3):147--153, 1995.
[21]
K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. Speech parameter generation algorithms for hmm-based speech synthesis. In Proceedings of ICASSP, 2000.

Cited By

View all
  • (2024)Toward Effective Communication of AI-Based Decisions in Assistive Tools: Conveying Confidence and Doubt to People with Visual Impairments at Accelerated SpeechProceedings of the 21st International Web for All Conference10.1145/3677846.3677862(177-189)Online publication date: 13-May-2024
  • (2024)Voicing Uncertainty: How Speech, Text, and Visualizations Influence Decisions with Data Uncertainty2024 IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks10.1109/UncertaintyVisualization63963.2024.00007(17-27)Online publication date: 14-Oct-2024
  • (2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASSETS '11: The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility
October 2011
348 pages
ISBN:9781450309202
DOI:10.1145/2049536
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. text-to-speech

Qualifiers

  • Research-article

Conference

ASSETS '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Effective Communication of AI-Based Decisions in Assistive Tools: Conveying Confidence and Doubt to People with Visual Impairments at Accelerated SpeechProceedings of the 21st International Web for All Conference10.1145/3677846.3677862(177-189)Online publication date: 13-May-2024
  • (2024)Voicing Uncertainty: How Speech, Text, and Visualizations Influence Decisions with Data Uncertainty2024 IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks10.1109/UncertaintyVisualization63963.2024.00007(17-27)Online publication date: 14-Oct-2024
  • (2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
  • (2022)Design and Evaluation of Accessible Collaborative Writing Techniques for People with Vision ImpairmentsACM Transactions on Computer-Human Interaction10.1145/348016929:2(1-42)Online publication date: 16-Jan-2022
  • (2021)Expanding a Large Inclusive Study of Human Listening RatesACM Transactions on Accessible Computing10.1145/346170014:3(1-26)Online publication date: 21-Jul-2021
  • (2020)Commanding and Re-DictationACM Transactions on Computer-Human Interaction10.1145/339088927:4(1-31)Online publication date: 3-Aug-2020
  • (2020)Reviewing Speech Input with AudioACM Transactions on Accessible Computing10.1145/338203913:1(1-28)Online publication date: 21-Apr-2020
  • (2020)"Nobody Speaks that Fast!" An Empirical Study of Speech Rate in Conversational Agents for People with Vision ImpairmentsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376569(1-13)Online publication date: 21-Apr-2020
  • (2019)Typing Slowly but Screen-FreeProceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3308561.3353789(427-439)Online publication date: 24-Oct-2019
  • (2019)Perception of sonified representations of complex systems by people who are blindAssistive Technology10.1080/10400435.2019.166693034:1(11-19)Online publication date: 2-Oct-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media