[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2513383.2517032acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article

Audio-visual speech understanding in simulated telephony applications by individuals with hearing loss

Published: 21 October 2013 Publication History

Abstract

We present a study into the effects of the addition of a video channel, video frame rate, and audio-video synchrony, on the ability of people with hearing loss to understand spoken language during video telephone conversations. Analysis indicates that higher frame rates result in a significant improvement in speech understanding, even when audio and video are not perfectly synchronized. At lower frame rates, audio-video synchrony is critical: if the audio is perceived 100 ms ahead of video, understanding drops significantly; if on the other hand the audio is perceived 100 ms behind video, understanding does not degrade versus perfect audio-video synchrony. These findings are validated in extensive statistical analysis over two within-subjects experiments with 24 and 22 participants, respectively.

References

[1]
Boothroyd, A. 1987. CASPER: Computer-assisted speech perception evaluation and training. In Proc. of the 10th annual conference on Rehabilitation Technology. Washington, D.C., Association for the Advancement of Rehabilitation Technology, 734--736.
[2]
Boyle, E. A., Anderson, A. H., & Newlands, A. 1994. The effects of visibility on dialogue and performance in a cooperative problem solving task. Language and speech, 37(1), 1--20.
[3]
Brault, L. M., Gilbert, J. L., Lansing, C. R., McCarley, J. S., and Kramer, A. F. 2010. Bimodal stimulus presentation and expanded auditory bandwidth improve older adults' speech perception. Hum. Fac.: J. Hum. Fac. Erg. Soc. 52, 4, (Sept. 2010), 479--491.
[4]
Grant, K. W., and Greenberg, S. 2001. Speech intelligibility derived from asynchronous processing of auditory-visual information. In Proc. of the Workshop on Audio-Visual Speech Processing (AVSP-2001), Scheelsminde, Denmark.
[5]
Hellström, G. 1988. Sign language and lip-reading real time conversation application of low bitrate video communication. ITU Telecommunications Standardization Sector, Study Group 16, Video Coding Experts Group, Question 15. Tampere, Finland, (April. 1998), 1--7.
[6]
Hellström, G. 1988. Draft H.263 Appendix "S: Sign Language and Lip-reading application of H.263. ITU Telecommunications Standardization Sector, Study Group 16, Video Coding Experts Group, Question 15/16. Sunriver, Oregon, (Sept. 8--12, 1997), 1--7.
[7]
Husain, A. M., Hayes, S., Young, M., & Shah, D. 2009. Visual evoked potentials with CRT and LCD monitors: When newer is not better. Neurology, 72(2), 162--164.
[8]
Jordan, T. R., and Sergeant, P. 2000. Effects of distance on visual and audiovisual speech recognition. Lang. Speech, 43,1 (Mar. 2000), 107--124.
[9]
Jordan, T. R., and Sergeant, P. C. 1998. Effects of facial image size on visual and audio-visual speech recognition. In Hearing By Eye II: Advances in the Psychology of Speechreading and Auditory-visual Speech, R. Campbell, B. Dodd, & D. Burnham, Eds., London: Taylor and Francis Psychology Press, 155--176.
[10]
Knoche, H., de Meer, H., and Kirsh, D. 2005. Compensating for low frame rates. In Late Breaking Results: Posters. (Portland, OR, April 2--7, 2005). CHI 2005. ACM., 1553--1556.
[11]
Kohlrausch, A., and van de Par, S. 2005. Audio-visual interaction in the context of multi-media applications. Communication acoustics. Springer Berlin Heidelberg, 109--138.
[12]
Lidestam, B., Danielsson, H., and Lönnborg, T. 2006. Mobile phone video as an aid to speech understanding for persons with hearing impairment. Technol. Disabil. 18 (2006), 99--105.
[13]
Mantokoudis, G., Dubach, P., Pfiffner, F., Kompis, M. Caversaccio, M., and Senn, P. 2012. Speech perception benefits of Internet versus conventional telephony for hearing-impaired individuals. J. Med. Internet Res. 14, 4 (Jul-Aug. 2012), e1021.
[14]
Massaro, D. W. 1987. Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry, Hillsdale, NJ: Lawrence Erlbaum Associates.
[15]
Möller, S. Quality of transmitted speech for humans and machines. Communication Acoustics. Springer Berlin Heidelberg, 163--192.
[16]
Pellegrino, F., Farinas, J., & Rouas, J. L. 2004. Automatic estimation of speaking rate in multilingual spontaneous speech. In Speech Prosody 2004, International Conference.
[17]
Richter, A. 2007. Mobile videotelephony: Test of 3G telephones. Hjalpmedelsinstitutet. The Swedish Handicap Institute. www.hi.se/publicerat.
[18]
Summerfield, Q. 1992. Lipreading and audio-visual speech perception. Phil. Trans. R. Soc. Lond. B. Biol Sci. 335, (Jan. 1992), 71--78.
[19]
Saks, A., and Hellström, G. 2006. Quality of conversation experience in sign language, lip-reading, and text. ITU-T Q.26/16 Accessibility to Multimedia. Geneva, Switzerland, (Jun. 2006), 1--26.
[20]
Sumby, W. H., & Pollack, I. 1954. Visual contribution to speech intelligibility in noise. The Journal of the Acoustical Society of America, 26, 212.
[21]
Tran, J., Kim, J., Chon J., Riskin, E., Ladner, R. and Wobbrock, J. 2011. Evaluating Quality and Comprehension of Real-Time Sign Language Video on Mobile Phones. In Proc. of ASSETS 2011. ACM, 115--122
[22]
Williams, J. J., Rutledge, J. C., Garstecki, D. C., and Katsaggelos, A. K. 1997. Frame rate and viseme analysis for multimedia applications. In Proc. of IEEE Multimedia and Signal Processing Conference, (Princeton, NJ, June 1997).
[23]
Woelders, W. W., Frowein, H. W., Nielsen, J., Questa, P., & Sandini, G. (1997). New developments in low-bit rate videotelephony for people who are deaf. Journal of Speech, Language and Hearing Research, 40(6), 1425.
[24]
Near Vision Card. http://www.bernell.com/product/2347/90

Cited By

View all
  • (2025)Lipreading as a communication strategy to enhance speech recognition in individuals with hearing impairment: a scoping reviewDisability and Rehabilitation: Assistive Technology10.1080/17483107.2025.2449984(1-12)Online publication date: 24-Jan-2025
  • (2023)Understanding and Enhancing The Role of Speechreading in Online d/DHH Communication AccessibilityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580810(1-17)Online publication date: 19-Apr-2023
  • (2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASSETS '13: Proceedings of the 15th International ACM SIGACCESS Conference on Computers and Accessibility
October 2013
343 pages
ISBN:9781450324052
DOI:10.1145/2513383
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio-video synchronization
  2. frame rate
  3. hard of hearing
  4. hearing loss
  5. lipreading
  6. speech understanding
  7. telecommunications accessibility

Qualifiers

  • Research-article

Funding Sources

Conference

ASSETS '13
Sponsor:

Acceptance Rates

ASSETS '13 Paper Acceptance Rate 28 of 98 submissions, 29%;
Overall Acceptance Rate 436 of 1,556 submissions, 28%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Lipreading as a communication strategy to enhance speech recognition in individuals with hearing impairment: a scoping reviewDisability and Rehabilitation: Assistive Technology10.1080/17483107.2025.2449984(1-12)Online publication date: 24-Jan-2025
  • (2023)Understanding and Enhancing The Role of Speechreading in Online d/DHH Communication AccessibilityProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580810(1-17)Online publication date: 19-Apr-2023
  • (2022)Accessibility-Related Publication Distribution in HCI Based on a Meta-AnalysisExtended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems10.1145/3491101.3519701(1-28)Online publication date: 27-Apr-2022
  • (2021)Factors Affecting the Accessibility of Voice Telephony for People with Hearing Loss: Audio Encoding, Network Impairments, Video and Environmental NoiseACM Transactions on Accessible Computing10.1145/347916014:4(1-35)Online publication date: 15-Oct-2021
  • (2019)Voice Telephony for Individuals with Hearing LossProceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3308561.3353796(3-15)Online publication date: 24-Oct-2019
  • (2019)Social, Cultural and Systematic Frustrations Motivating the Formation of a DIY Hearing Loss Hacking CommunityProceedings of the 2019 CHI Conference on Human Factors in Computing Systems10.1145/3290605.3300531(1-14)Online publication date: 2-May-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media