More Web Proxy on the site http://driver.im/

research-article

On the intelligibility of fast synthesized speech for individuals with early-onset blindness

Authors:

Taniya MishraAuthors Info & Claims

ASSETS '11: The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility

Pages 211 - 218

https://doi.org/10.1145/2049536.2049574

Published: 24 October 2011 Publication History

Abstract

People with visual disabilities increasingly use text-to-speech synthesis as a primary output modality for interaction with computers. Surprisingly, there have been no systematic comparisons of the performance of different text-to-speech systems for this user population. In this paper we report the results of a pilot experiment on the intelligibility of fast synthesized speech for individuals with early-onset blindness. Using an open-response recall task, we collected data on four synthesis systems representing two major approaches to text-to-speech synthesis: formant-based synthesis and concatenative unit selection synthesis. We found a significant effect of speaking rate on intelligibility of synthesized speech, and a trend towards significance for synthesizer type. In post-hoc analyses, we found that participant-related factors, including age and familiarity with a synthesizer and voice, also affect intelligibility of fast synthesized speech.

References

[1]

Apache commons codec. http://commons.apache.org/codec/apidocs/overview-summary.html.

[2]

C. Asawka, H. Takagi, S. Ino, and T. Ifukube. Maximum listening speeds for the blind. In Proceedings of the International Conference on Auditory Display, 2003.

[3]

M. Beutnagel et al. The AT&T next-gen TTS system. In Proceedings of the Joint Meeting of the ASA, EAA and DAGA, 1999.

[4]

H. T. Bunell and J. Lilley. Analysis methods for assessing TTS intelligibility. Presented at the 6th ISCA Workshop on Speech Synthesis, 2007.

[5]

S. Chapman. Simmetrics. http://staffwww.dcs.shef.ac.uk/people/S.Chapman/ simmetrics.html.

[6]

eSpeak. http://espeak.sourceforge.net/.

[7]

F. Gougoux et al. Neuropsychology: Pitch discrimination in the early blind. Nature, 430, 2004.

[8]

W. Hallahan. DECtalk software: Text-to-speech technology and implementation. Digital Technical Journal, 7(4), 1995.

[9]

S. Hertz, R. Younes, and N. Zinovieva. Language-universal and language-specific components in the multi-language ETI-Eloquence text-to-speech system. In Proceedings of the 14th International Congress of Phonetic Sciences, 1999.

[10]

IVONA. http://www.ivona.com/.

[11]

E. Janse. Word perception in fast speech: artificially time-compressed vs. naturally produced fast speech. Speech Communication, 42:155--173, 2004.

[12]

E. Janse, M. van der Werff, and H. Quené. Listening to fast speech: aging and sentence context. In Proceedings of the 16th International Congress of Phonetic Sciences, 2007.

[13]

D. Klatt and L. Klatt. Analysis, synthesis, and perception of voice quality variations among female and male talkers. Journal of the Acoustical Society of America, 87(2):820--857, 1990.

[14]

J. Lebeter and S. Saunders. The effects of time compression on the comprehension of natural and synthetic speech. Working Papers of the Linguistics Circle of the University of Victoria, 20:63--81, 2010.

[15]

D. Moers, P. Wagner, B. Möbius, F. Müllers, and I. Jauk. Integrating a fast speech corpus in unit selection synthesis: experiments on perception, segmentation, and duration prediction. In Proceedings of Speech Prosody, 2010.

[16]

A. Moos and J. Trouvain. Comprehension of ultra-fast speech - blind vs. "normally hearing" persons. In Proceedings of the 16th International Congress of Phonetic Sciences, 2007.

[17]

T. Nishimoto et al. Effect of learning on listening to ultra-fast synthesized speech. In Proceedings of the IEEE Engineering in Medicine and Biology Conference, 2006.

[18]

K. Papadopoulos, E. Katemidou, A. Koutsoklenis, and E. Mouratidou. Differences among sighted individuals and individuals with visual impairments in word intelligibility presented via synthetic and natural speech. Augmentative and Alternative Communication, 26(4):278--288, 2010.

[19]

L. Phillips. The double metaphone search algorithm. C/C++ Users Journal, 18(6):38--43, June 2000.

Digital Library

[20]

B. Sutton, J. King, K. Hux, and D. Beukelman. Younger and older adults' rate performance when listening to synthetic speech. Augmentative and Alternative Communication, 11(3):147--153, 1995.

[21]

K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. Speech parameter generation algorithms for hmm-based speech synthesis. In Proceedings of ICASSP, 2000.

Cited By

Akter TSwaminathan MKapadia A(2024)Toward Effective Communication of AI-Based Decisions in Assistive Tools: Conveying Confidence and Doubt to People with Visual Impairments at Accelerated SpeechProceedings of the 21st International Web for All Conference10.1145/3677846.3677862(177-189)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677862
Stokes CSanker CCogley BSetlur V(2024)Voicing Uncertainty: How Speech, Text, and Visualizations Influence Decisions with Data Uncertainty2024 IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks10.1109/UncertaintyVisualization63963.2024.00007(17-27)Online publication date: 14-Oct-2024
https://doi.org/10.1109/UncertaintyVisualization63963.2024.00007
Colley MKränzle TRukzio E(2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
https://dl.acm.org/doi/10.1145/3491101.3519701
Show More Cited By

Index Terms

On the intelligibility of fast synthesized speech for individuals with early-onset blindness
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Social and professional topics
  1. Professional topics
    1. Computing profession
      1. Assistive technologies
  2. User characteristics
    1. People with disabilities

Recommendations

Synthesized speech intelligibility and persuasion: Speech rate and non-native listeners

This experiment assessed the effect of variation in speech rate on comprehension and persuasiveness of a message presented in text-to-speech (TTS) synthesis to native and non-native listeners. Eighty non-native speakers of English and 80 native speakers ...
Analysis and modeling of F0 contours for cantonese text-to-speech

For the generation of highly natural synthetic speech, the control of prosody is of primary importance. The fundamental frequency (F0) is one of the most important components of speech prosody. This research investigates the variation of F0 in ...
Intelligibility of time-compressed synthetic speech

Analysis of listeners' intelligibility of natural and synthetic time-compressed speech.Different compression methods are applied to normal and fast speech.We evaluated a linear method and two non linear methods that act on the duration model.The linear ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASSETS '11: The proceedings of the 13th international ACM SIGACCESS conference on Computers and accessibility

October 2011

348 pages

ISBN:9781450309202

DOI:10.1145/2049536

General Chair:
Kathleen F. McCoy
University of Delaware, USA
,
Program Chair:
Yeliz Yesilada
Middle East Technical University Northern Cyprus Campus, Cyprus

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGACCESS: ACM Special Interest Group on Accessible Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

text-to-speech

Qualifiers

Research-article

Conference

ASSETS '11

Sponsor:

SIGACCESS

ASSETS '11: The 13th International ACM SIGACCESS Conference on Computers and Accessibility

October 24 - 26, 2011

Dundee, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 436 of 1,556 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

37
Total Citations
View Citations
344
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Akter TSwaminathan MKapadia A(2024)Toward Effective Communication of AI-Based Decisions in Assistive Tools: Conveying Confidence and Doubt to People with Visual Impairments at Accelerated SpeechProceedings of the 21st International Web for All Conference10.1145/3677846.3677862(177-189)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3677846.3677862
Stokes CSanker CCogley BSetlur V(2024)Voicing Uncertainty: How Speech, Text, and Visualizations Influence Decisions with Data Uncertainty2024 IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks10.1109/UncertaintyVisualization63963.2024.00007(17-27)Online publication date: 14-Oct-2024
https://doi.org/10.1109/UncertaintyVisualization63963.2024.00007
Colley MKränzle TRukzio E(2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
https://dl.acm.org/doi/10.1145/3491101.3519701
Das MPiper AGergle D(2022)Design and Evaluation of Accessible Collaborative Writing Techniques for People with Vision ImpairmentsACM Transactions on Computer-Human Interaction10.1145/348016929:2(1-42)Online publication date: 16-Jan-2022
https://dl.acm.org/doi/10.1145/3480169
Bragg DReinecke KLadner R(2021)Expanding a Large Inclusive Study of Human Listening RatesACM Transactions on Accessible Computing10.1145/346170014:3(1-26)Online publication date: 21-Jul-2021
https://dl.acm.org/doi/10.1145/3461700
Ghosh DLiu CZhao SHara K(2020)Commanding and Re-DictationACM Transactions on Computer-Human Interaction10.1145/339088927:4(1-31)Online publication date: 3-Aug-2020
https://dl.acm.org/doi/10.1145/3390889
Hong JVaing CKacorri HFindlater L(2020)Reviewing Speech Input with AudioACM Transactions on Accessible Computing10.1145/338203913:1(1-28)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3382039
Choi DKwak DCho MLee SBernhaupt RMueller FVerweij DAndres JMcGrenere JCockburn AAvellino IGoguey ABjørn PZhao SSamson BKocielnik R(2020)"Nobody Speaks that Fast!" An Empirical Study of Speech Rate in Conversational Agents for People with Vision ImpairmentsProceedings of the 2020 CHI Conference on Human Factors in Computing Systems10.1145/3313831.3376569(1-13)Online publication date: 21-Apr-2020
https://dl.acm.org/doi/10.1145/3313831.3376569
Mathur RSheth AVyas PBolchini DBigham JAzenkot SKane S(2019)Typing Slowly but Screen-FreeProceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3308561.3353789(427-439)Online publication date: 24-Oct-2019
https://dl.acm.org/doi/10.1145/3308561.3353789
Lahav OKittany JLevy SFurst M(2019)Perception of sonified representations of complex systems by people who are blindAssistive Technology10.1080/10400435.2019.166693034:1(11-19)Online publication date: 2-Oct-2019
https://doi.org/10.1080/10400435.2019.1666930
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents