[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3234695.3236355acmconferencesArticle/Chapter ViewAbstractPublication PagesassetsConference Proceedingsconference-collections
research-article
Public Access

Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

Published: 08 October 2018 Publication History

Abstract

Deaf and hard of hearing (DHH) individuals face barriers to communication in small-group meetings with hearing peers; we examine generation of captions on mobile devices by automatic speech recognition (ASR). While ASR output displays errors, we study whether such tools benefit users and influence conversational behaviors. An experiment was conducted where DHH and hearing individuals collaborated in discussions in three conditions (without an ASR-based application, with the application, and with a version indicating words for which the ASR has low confidence). An analysis of audio recordings, from each participant across conditions, revealed significant differences in speech features. When using the ASR-based automatic captioning application, hearing individuals spoke more loudly, with improved voice quality (harmonics-to-noise ratio), with a non-standard articulation (changes in F1 and F2 formants), and at a faster rate. Identifying non-standard speech in this setting has implications on the composition of data used for ASR training/testing, which should be representative of its usage context. Understanding these behavioral influences may also enable designers of ASR captioning systems to leverage these effects, to promote communication success.

References

[1]
ANSI/ASA S3.5--1997. 1997. American National Standard Methods for Calculation of the Speech Intelligibility Index. Acoustical Society of America (ASA) and American National Standards Institute (ANSI), New York, NY, USA.
[2]
Keith Bain, Sara H. Basson, and Mike Wald. 2002. Speech recognition in university classrooms: liberated learning project. In Proceedings of the fifth international ACM conference on Assistive technologies (Assets '02). ACM, New York, NY, USA, 192--196.
[3]
Jon P. Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe. 2017. The CHiME challenges: Robust speech recognition in everyday environments. In: Watanabe S., Delcroix M., Metze F., Hershey J. (eds.), New Era for Robust Speech Recognition. Springer, Cham, 327--344.
[4]
Larwan Berke, Christopher Caulfield, and Matt Huenerfauth. 2017. Deaf and Hard-of-Hearing Perspectives on Imperfect Automatic Speech Recognition for Captioning One-on-One Meetings. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 155--164.
[5]
Larwan Berke, Sushant Kafle, Matt Huenerfauth. 2018. Methods for Evaluation of Imperfect Captioning Tools by Deaf or Hard-of-Hearing Users at Different Reading Literacy Levels. In Proceedings of the ACM CHI Conference on Human Factors in Computing Systems (CHI'18). ACM, New York, NY, USA.
[6]
Paul Boersma and David Weenink. 2018. Praat: doing phonetics by computer {Computer program}. Version 6.0.39. Retrieved April 3, 2018, from http://www.praat.org/
[7]
Denis Burnham, Sebastian Joeffry, Lauren Rice. 2010. Computer- and Human-Directed Speech Before and After Correction. In Proceedings of the 9th Speech Science and Technology Conference 2010, Melbourne, Australia. Australian Speech Science and Technology Association.
[8]
Esteban Buz, Michael K. Tanenhaus, and T. Florian Jaeger. 2016. Dynamically adapted context-specific hyper-articulation: Feedback from interlocutors affects speakers' subsequent pronunciations. Journal of Memory and Language 89, Supplement C (2016), 68 -- 86.
[9]
ELAN (Version 5.0.0-beta) {Computer software}. 2017. Nijmegen: Max Planck Institute for Psycholinguistics. Retrieved from https://tla.mpi.nl/tools/tla-tools/elan/
[10]
Lisa B. Elliot, Michael Stinson, James Mallory, Donna Easton, and Matt Huenerfauth. 2016. Deaf and Hard of Hearing Individuals' Perceptions of Communication with Hearing Colleagues in Small Groups. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 271--272.
[11]
Lisa B. Elliot, Michael Stinson, Syed Ahmed, and Donna Easton. 2017. User Experiences When Testing a Messaging App for Communication Between Individuals Who Are Hearing and Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 405--406.
[12]
Maria Federico and Marco Furini. 2012. Enhancing Learning Accessibility Through Fully Automatic Captioning. In Proceedings of the International CrossDisciplinary Conference on Web Accessibility (W4A '12). ACM, New York, NY, USA, Article 40, 4 pages.
[13]
Ira R. Forman, Ben Fletcher, John Hartley, Bill Rippon, and Allen Wilson. 2012. Blue Herd: Automated Captioning for Videoconferences. In Proceedings of the 14th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '12). ACM, NY, NY, USA, 227--228.
[14]
Carrie Lou Garberoglio, Stephanie Cawthon and Mark Bond. 2016. Deaf People and Employment in the United States: 2016. National Deaf Center on Postsecondary Outcomes. Retrieved on April 15, 2018, from https://www.nationaldeafcenter.org/sites/default/files/Deaf%20Employment%20Report_final.pdf
[15]
Jay Hall. 1983. The rejection of deviates as a function of threat. Doctoral Dissertation, University of Texas.
[16]
Hearing Loss Association of America. 2017. Basic Facts About Hearing Loss. Retrieved December 17, 2017 from http://www.hearingloss.org/content/basic-facts-about-hearing-loss
[17]
Sushant Kafle and Matt Huenerfauth. 2017. Evaluating the Usability of Automatically Generated Captions for People Who Are Deaf or Hard of Hearing. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 165--174.
[18]
Saba Kawas, George Karalis, Tzu Wen, and Richard E. Ladner. 2016. Improving Real-Time Captioning Experiences for Deaf and Hard of Hearing Students. In Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '16). ACM, NY, NY, USA, 15--23.
[19]
Ronald R. Kelly. 2015. The Employment and Career Growth of Deaf and Hard of Hearing Individuals. Raising and Educating Deaf Children: Foundations for Policy, Practice, and Outcomes. Retrieved from http://www.raisingandeducatingdeafchildren.org/2015/01/12/the-employment-and-career-growth-of-deaf-and-hard-of-hearing-individuals/
[20]
S. Koster. 2001. Acoustic-phonetic characteristics of hyperarticulated speech for different speaking styles. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. (Cat. No.01CH37221). IEEE.
[21]
Raja Kushalnagar, Walter Lasecki, Jeffrey Bigham. 2014. Accessibility evaluation of classroom captions. ACM Trans. Access. Comput. 5, 3, Article 7 (Jan. 2014), 24 pp.
[22]
Walter Lasecki, Christopher Miller, Adam Sadilek, Andrew Abumoussa, Donato Borrello, Raja Kushalnagar, and Jeffrey Bigham. 2012. Real-time captioning by groups of non-experts. In Proceedings of the 25th annual ACM symposium on User interface software and technology (UIST '12). ACM, New York, NY, USA, 23--34.
[23]
Uwe Ligges. 2017. Package 'tuneR' (version 1.3.2): Analysis of Music and Speech. Retrieved on April 15, 2018, from https://cran.r-project.org/web/packages/tuneR/tuneR.pdf
[24]
Steven R. Livingstone, Deanna H. Choi, and Frank A. Russo. 2014. The influence of vocal training and acting experience on measures of voice quality and emotional genuineness. Frontiers in Psychology 5 (Mar 2014).
[25]
Catherine L. Lortie, Mélanie Thibeault, Matthieu J. Guitton, and Pascale Tremblay. 2015. Effects of age on the amplitude, frequency and perceived quality of voice. AGE 37, 6 (Nov 2015).
[26]
James R. Mallory, Michael Stinson, Lisa Elliot, and Donna Easton. 2017. Personal Perspectives on Using Automatic Speech Recognition to Facilitate Communication Between Deaf Students and Hearing Customers. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '17). ACM, NY, NY, USA, 419--421.
[27]
National Institute of Standards and Technology. 2015. Sclite (version 2.4.10). Retrieved on April 10, 2018, from ftp://jaguar.ncsl.nist.gov/pub/sctk-2.4.10--20151007--1312Z.tar.bz2
[28]
Nuance. 2017. Documentation for the SpeechKit 2 SDK for Android. Retrieved on June 1, 2018, from https://developer.nuance.com/public/Help/DragonMobileSDKReference_Android/index.html
[29]
Sharon Oviatt, Gina-Anne Levow, Elliott Moreton, and Margaret MacEachern. 1998. Modeling Global and Focal Hyperarticulation during Human--Computer Error Resolution. J. Acoust. Soc. Amer. 104, 3080--3098.
[30]
Benjamin Picart, Thomas Drugman, and Thierry Dutoit. 2010. Analysis and synthesis of hypo-and hyperarticulated speech. In Proceedings of the Seventh ISCA Workshop on Speech Synthesis.
[31]
Koenraad S. Rhebergena, Johannes Lyzenga, Wouter A. Dreschler, Joost M. Festen. 2010. Modeling speech intelligibility in quiet and noise in listeners with normal and impaired hearing. The Journal of the Acoustical Society of America 127(3), 1570--1583. Acoustical Society of America.
[32]
RIT News. 2018. RIT/NTID and Microsoft launch partnership for AI driven accessibility. Retrieved on April 15, 2018, from http://www.ntid.rit.edu/news/ritntidand-microsoft-launch-partnership-ai-driven-accessibility
[33]
Rein Ove Sikveland. 2006. How do We Speak to Foreigners? - Phonetic Analyses of Speech Communication between L1 and L2 Speakers of Norwegian. Working Papers 52, 109--112. Centre for Language and Literature, Lund University, Sweden.
[34]
Hagen Soltau and Alex Waibel. 1998. On the Influence of Hyperarticulated Speech on Recognition Performance. In Proceedings of the 5th International Conference on Spoken Language Processing, ICSLP 1998, Sydney, Australia.
[35]
Amanda J. Stent, Marie K. Huffman, and Susan E. Brennan. 2008. Adapting Speaking After Evidence of Misrecognition: Local and Global Hyperarticulation. Speech Commun. 50, 3 (March 2008), 163--178.
[36]
Michael Stinson, Carly Linneah, Jonathan MacDonald, and Chelsea Powers. 2014. Using technology to improve communication in small groups with deaf and hearing students. Presentation at the 2nd Annual Effective Access Technology Conference. Rochester Institute of Technology, Rochester, NY, USA.
[37]
Hironobu Takagi, Takashi Itoh, and Kaoru Shinkawa. 2015. Evaluation of Realtime Captioning by Machine Recognition with Human Support. In Proceedings of the 12th Web for All Conference (W4A '15). ACM, New York, NY, USA, Article 5, 4 pages.
[38]
M. Wald. 2011. Crowdsourcing Correction of Speech Recognition Captioning Errors. In Proceedings of the International Cross-Disciplinary Conference on Web Accessibility (W4A '11). ACM, New York, NY, USA, Article 22, 2 pages.
[39]
Gregory R. Warnes. 2013. Calculating Speech Intelligibility Index (SII) using R. Retrieved on April 12, 2018, from https://cran.r-project.org/web/packages/SII/vignettes/SII.pdf
[40]
W. Xiong, J. Droppo, X. Huang, F. Seide, . Seltzer, A. Stolcke, D. Yu, G. Zweig. 2016. Achieving human parity in conversational speech recognition. Computing Research Repository (CoRR), http://arxiv.org/abs/1610.05256
[41]
Jiahong Yuan, Mark Liberman, Christopher Cieri. 2007. Towards an Integrated Understanding of Speaking Rate in Conversation. In Proceedings of the 16th International Congress of Phonetic Sciences ICPhS XVI, 6--10 August 2007, Saarbrücken, Germany, 1337--1340, University of Saarbrüken.

Cited By

View all
  • (2024)Envisioning Collective Communication Access: A Theoretically-Grounded Review of Captioning Literature from 2013-2023Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675649(1-18)Online publication date: 27-Oct-2024
  • (2024)Measuring the Accuracy of Automatic Speech Recognition SolutionsACM Transactions on Accessible Computing10.1145/363651316:4(1-23)Online publication date: 9-Jan-2024
  • (2024)Towards Inclusive Video Commenting: Introducing Signmaku for the Deaf and Hard-of-HearingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642287(1-18)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. Behavioral Changes in Speakers who are Automatically Captioned in Meetings with Deaf or Hard-of-Hearing Peers

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        ASSETS '18: Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility
        October 2018
        508 pages
        ISBN:9781450356503
        DOI:10.1145/3234695
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 08 October 2018

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. accessibility
        2. automatic speech recognition
        3. communication
        4. deaf and hard of hearing
        5. speaking behavior

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        ASSETS '18
        Sponsor:

        Acceptance Rates

        ASSETS '18 Paper Acceptance Rate 28 of 108 submissions, 26%;
        Overall Acceptance Rate 436 of 1,556 submissions, 28%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)152
        • Downloads (Last 6 weeks)16
        Reflects downloads up to 30 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Envisioning Collective Communication Access: A Theoretically-Grounded Review of Captioning Literature from 2013-2023Proceedings of the 26th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3663548.3675649(1-18)Online publication date: 27-Oct-2024
        • (2024)Measuring the Accuracy of Automatic Speech Recognition SolutionsACM Transactions on Accessible Computing10.1145/363651316:4(1-23)Online publication date: 9-Jan-2024
        • (2024)Towards Inclusive Video Commenting: Introducing Signmaku for the Deaf and Hard-of-HearingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642287(1-18)Online publication date: 11-May-2024
        • (2024)Towards Co-Creating Access and Inclusion: A Group Autoethnography on a Hearing Individual's Journey Towards Effective Communication in Mixed-Hearing Ability Higher Education SettingsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642017(1-14)Online publication date: 11-May-2024
        • (2023)Jod: Examining Design and Implementation of a Videoconferencing Platform for Mixed Hearing GroupsProceedings of the 25th International ACM SIGACCESS Conference on Computers and Accessibility10.1145/3597638.3608382(1-18)Online publication date: 22-Oct-2023
        • (2023)Understanding Social and Environmental Factors to Enable Collective Access Approaches to the Design of Captioning TechnologyACM SIGACCESS Accessibility and Computing10.1145/3584732.3584735(1-1)Online publication date: 15-Feb-2023
        • (2023)Visualization of Speech Prosody and Emotion in Captions: Accessibility for Deaf and Hard-of-Hearing UsersProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581511(1-15)Online publication date: 19-Apr-2023
        • (2023)“We Speak Visually”: User-Generated Icons for Better Video-Mediated Mixed-Group Communications Between Deaf and Hearing ParticipantsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581151(1-16)Online publication date: 19-Apr-2023
        • (2023)“Easier or Harder, Depending on Who the Hearing Person Is”: Codesigning Videoconferencing Tools for Small Groups with Mixed Hearing StatusProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3580809(1-15)Online publication date: 19-Apr-2023
        • (2023)Hidden Bawls, Whispers, and Yelps: Can Text Convey the Sound of Speech, Beyond Words?IEEE Transactions on Affective Computing10.1109/TAFFC.2022.317472114:1(6-16)Online publication date: 1-Jan-2023
        • Show More Cited By

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media