More Web Proxy on the site http://driver.im/

research-article

Open access

Speaking of accent: A content analysis of accent misconceptions in ASR research

Authors:

Cathleen A. PowerAuthors Info & Claims

FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency

Pages 1245 - 1254

https://doi.org/10.1145/3630106.3658969

Published: 05 June 2024 Publication History

All formats PDF

Abstract

Automatic speech recognition (ASR) researchers are working to address the differing transcription performance of ASR by accent or dialect. However, research often has a limited view of accent in ways that reproduce discrimination and limit the scope of potential solutions. In this paper we present a content analysis of 22 papers published in 2022 in top conferences and journals on the topic of accent and ASR. We report on how accent is sometimes mistakenly viewed as something some people don’t have; as having a default; and being an attribute only of the speaker, and not of the listener. We discuss the implications on research and provide recommendations to researchers who hope to reduce ASR biases by accent.

References

[1]

Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, and Virginie Woisard. 2022. Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 200–214.

Digital Library

[2]

Gloria Anzaldúa. 1987. Borderlands/la frontera: The new mestiza. Aunt Lute Books, Chapter How to Tame a Wild Tongue, 75–86.

[3]

John Baugh. 2018. Linguistics in pursuit of justice. Cambridge University Press.

[4]

Emily Bender. 2019. The# benderrule: On naming the languages we study and why it matters. The Gradient 14 (2019).

[5]

Emily M Bender. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology 6 (2011), 1–26.

[6]

Ruha Benjamin. 2023. Race after technology (1 ed.). Polity.

[7]

Mary Blair-Loy and Erin A Cech. 2022. Misconceiving merit: Paradoxes of excellence and devotion in academic science and engineering. University of Chicago Press.

[8]

Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of "Bias" in NLP. arxiv:2005.14050 [cs.CL]

[9]

Danielle Bragg, Abraham Glasser, Fyodor Minakov, Naomi Caselli, and William Thies. 2022. Exploring Collection of Sign Language Videos through Crowdsourcing. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–24.

Digital Library

[10]

Curtis Bunn. 2024. Lululemon founder’s remarks have some DEI experts calling for boycotts to combat ‘regressive values’. NBC News (6 Jan. 2024).

[11]

Joy Buolamwini. 2023. Unmasking AI: my mission to protect what is human in a world of machines. Random House.

[12]

May Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, and Nicole Holliday. 2022. Training and typological bias in ASR performance for world Englishes. In Interspeech 2022. ISCA, 1273–1277. https://doi.org/10.21437/Interspeech.2022-10869

[13]

Shefali Chandra. 2012. The sexual life of English: Languages of caste and desire in colonial India. Duke University Press.

[14]

Elizabeth R Cole. 2009. Intersectionality and research in psychology. American Psychologist 64, 3 (2009), 170–180.

[15]

Kumari Devarajan. 2018. Ready For A Linguistic Controversy? Say ’Mmhmm’. https://www.npr.org/sections/codeswitch/2018/08/17/606002607/ready-for-a-linguistic-controversy-say-mhmm. Code Switch (17 Aug. 2018).

[16]

Alex DiChristofano, Henry Shuster, Shefali Chandra, and Neal Patwari. 2023. Performance disparities between accents in automatic speech recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. https://arxiv.org/abs/2208.01157

[17]

Catherine D’Ignazio and Lauren F Klein. 2023. Data feminism. MIT Press.

[18]

Zijian Ding, Jiawen Kang, Tinky Oi Ting Ho, Ka Ho Wong, Helene H Fung, Helen Meng, and Xiaojuan Ma. 2022. TalkTive: a conversational agent using backchannels to engage older adults in neurocognitive disorders screening. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[19]

Nina Sun Eidsheim. 2023. Rewriting algorithms for just recognition. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press, 134–150.

[20]

Radhika Garg, Hua Cui, Spencer Seligson, Bo Zhang, Martin Porcheron, Leigh Clark, Benjamin R Cowan, and Erin Beneteau. 2022. The last decade of HCI research on children and voice-based conversational agents. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.

Digital Library

[21]

Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.

Digital Library

[22]

Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, and Helen Meng. 2022. Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2597–2611.

Digital Library

[23]

Shahram Ghorbani and John HL Hansen. 2022. Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 762–774.

Digital Library

[24]

Sandra Harding. 2001. Feminist standpoint epistemology. The gender and science reader (2001), 145–168.

[25]

Michael Harriot. 2023. Black AF History: The Un-Whitewashed Story of America. Dey Street Books.

[26]

Yvette R Harris and Valarie M Schroeder. 2013. Language deficits or differences: What we know about African American Vernacular English in the 21st century. International Education Studies 6, 4 (2013), 194–204.

[27]

Drew Harwell. 2018. The accent gap: We tested Amazon’s Alexa and Google’s Home to see how people with accents are getting left behind in the smart speaker revolution. The Washington Post (18 July 2018).

[28]

Monique M Hennink, Bonnie N Kaiser, and Vincent C Marconi. 2017. Code saturation versus meaning saturation: how many interviews are enough?Qualitative health research 27, 4 (2017), 591–608.

[29]

Lauren N Irwin. 2022. White Normativity: Tracing Historical and Contemporary (Re)Productions of Whiteness in Higher Education. In Critical Whiteness Praxis in Higher Education. 48–69.

[30]

Xiaofu Jin, Xiaozhu Hu, Xiaoying Wei, and Mingming Fan. 2022. Synapse: Interactive Guidance by Demonstration with Trial-and-Error Support for Older Adults to Use Smartphone Apps. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–24.

Digital Library

[31]

Alexander Johnson, Ruchao Fan, Robin Morris, and Abeer Alwan. 2022. LPC Augment: an LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children’s Dialects. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8577–8581.

[32]

Os Keyes. 2018. The misgendering machines: Trans/HCI implications of automatic gender recognition. Proceedings of the ACM on Human-Computer Interaction 2 (2018), 1–22.

Digital Library

[33]

Anam Ahmad Khan, Joshua Newn, James Bailey, and Eduardo Velloso. 2022. Integrating Gaze and Speech for Enabling Implicit Interactions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–14.

Digital Library

[34]

Young-Ho Kim, Diana Chou, Bongshin Lee, Margaret Danilovich, Amanda Lazar, David E Conroy, Hernisa Kacorri, and Eun Kyoung Choe. 2022. Mymove: Facilitating older adults to collect in-situ activity labels on a smartwatch with speech. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–21.

Digital Library

[35]

Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14 (2020), 7684–7689.

[36]

Neeraj Kumar, Ankur Narang, and Brejesh Lall. 2022. Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 1679–1693.

Digital Library

[37]

Pratik Kumar, Vrunda N Sukhadia, and S Umesh. 2022. Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6887–6891.

[38]

Halcyon M. Lawrence. 2021. Siri disciplines. In Your Computer Is On Fire, Thomas S. Mullaney, Benjamin Peters, Mar Hicks, and Kavita Philip (Eds.). MIT Press, 179–198. https://mitpress.mit.edu/books/your-computer-fire

[39]

Franklin Mingzhe Li, Cheng Lu, Zhicong Lu, Patrick Carrington, and Khai N Truong. 2022. An exploration of captioning practices and challenges of individual content creators on YouTube for people with hearing impairments. Proceedings of the ACM on Human-Computer Interaction 6, CSCW1 (2022), 1–26.

Digital Library

[40]

Rosina Lippi-Green. 1997. English with an accent (1 ed.). Routledge.

[41]

Nina Markl. 2022. Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition. In ACM Conference on Fairness, Accountability, and Transparency (FAccT 2022). 521–534.

Digital Library

[42]

Vikram C Mathad, Julie M Liss, Kathy Chapman, Nancy Scherer, and Visar Berisha. 2022. Consonant-vowel transition models based on deep learning for objective evaluation of articulation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 86–95.

Digital Library

[43]

Mari J Matsuda. 1991. Voices of America: Accent, antidiscrimination law, and a jurisprudence for the last reconstruction. Yale Law Journal (1991), 1329–1407.

[44]

Peggy McIntosh. 1989. White privilege: Unpacking the invisible knapsack. Peace and Freedom Magazine (July/Aug 1989), 10–12.

[45]

Josh Meyer, Lindy Rauchenstein, Joshua D. Eisenberg, and Nicholas Howell. 2020. Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 6462–6468. https://aclanthology.org/2020.lrec-1.796

[46]

Krystyn R Moon. 2005. Yellowface: creating the Chinese in American popular music and performance, 1850s-1920s. Rutgers University Press.

[47]

Wesley Morris. 2021. Music. In The 1619 Project, Nikole Hannah-Jones, Caitlin Roper, Ilena Silverman, and Jake Silverstein (Eds.). One World, New York, Chapter 14, 358–379.

[48]

Veena Naregal. 2001. Language Politics, Elites and the Public Sphere: Western India under Colonialism. Permanent Black, New Delhi, Delhi.

[49]

Julia Nee, Genevieve Macfarlane Smith, Alicia Sheares, and Ishita Rustagi. 2022. Linguistic justice as a framework for designing, developing, and managing natural language processing tools. Big Data & Society 9, 1 (2022), 20539517221090930.

[50]

Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, and Bin Yu. 2022. Towards Robust Waveform-Based Acoustic Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 1977–1992.

Digital Library

[51]

Cathleen A. Power. 2023. Just Hit Me Already: Obscured Workplace Abuse and Discrimination. ADVANCE Journal 4, 1 (2023).

[52]

Yanmin Qian, Xun Gong, and Houjun Huang. 2022. Layer-wise fast adaptation for end-to-end multi-accent speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2842–2853.

Digital Library

[53]

Emilee Rader, Margaret Echelbarger, and Justine Cassell. 2011. Brick by brick: iterating interventions to bridge the achievement gap with virtual peers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2971–2974.

Digital Library

[54]

Valeria Ramírez-Castañeda. 2020. Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences. PloS one 15, 9 (2020), e0238372.

[55]

Pooja Rangan. 2023. From “Handicap” to Crip Curb Cut: Thinking Accent with Disability. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press, 54–72.

[56]

Thomas Reitmaier, Electra Wallington, Dani Kalarikalayil Raju, Ondrej Klejch, Jennifer Pearson, Matt Jones, Peter Bell, and Simon Robinson. 2022. Opportunities and challenges of automatic speech recognition systems for low-resource language speakers. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.

Digital Library

[57]

Abdolreza Sabzi Shahrebabaki, Giampiero Salvi, Torbjørn Svendsen, and Sabato Marco Siniscalchi. 2021. Acoustic-to-articulatory mapping with joint optimization of deep speech enhancement and articulatory inversion models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2021), 135–147.

Digital Library

[58]

Claude E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (1948), 379–423. http://math.harvard.edu/ ctm/home/text/others/shannon/entropy/entropy.pdf

[59]

Tanmay Srivastava, Prerna Khanna, Shijia Pan, Phuc Nguyen, and Shubham Jain. 2022. MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–26.

Digital Library

[60]

Pavitra Sundar. 2023. Listening with an Accent – or How to Loeribari. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press.

[61]

Rachael Tatman. 2017. Gender and Dialect Bias in YouTube’s Automatic Captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics, Valencia, Spain, 53–59. https://doi.org/10.18653/v1/W17-1606

[62]

Rachael Tatman and Conner Kasten. 2017. Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. In Proc. Interspeech 2017. 934–938. https://doi.org/10.21437/Interspeech.2017-1746

[63]

Erik R Thomas. 2004. Rural white Southern accents. Mouton de Gruyter Berlin), 300–324.

[64]

Martin J. Tobin and Amal Jubran. 2022. Pulse oximetry, racial bias and statistical bias. Annals of Intensive Care 12, 1 (Jan 2022), 1–2.

[65]

US Equal Employment Opportunity Commission (EEOC). [n. d.]. National Origin Discrimination. https://www.eeoc.gov/national-origin-discrimination. Accessed: 5 Dec. 2023.

[66]

Alicia Nicki Washington. 2020. When twice as good isn’t enough: The case for cultural competence in computing. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 213–219.

Digital Library

[67]

Jing Wei, Weiwei Jiang, Chaofan Wang, Difeng Yu, Jorge Goncalves, Tilman Dingler, and Vassilis Kostakos. 2022. Understanding How to Administer Voice Surveys through Smart Speakers. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–32.

Digital Library

[68]

Steven Weinberger. 2015. Speech accent archive. https://accent.gmu.edu/about.php. George Mason University.

[69]

Bin Wu, Sakriani Sakti, Jinsong Zhang, and Satoshi Nakamura. 2022. Modeling unsupervised empirical adaptation by DPGMM and DPGMM-RNN hybrid model to extract perceptual features for low-resource ASR. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 901–916.

Digital Library

[70]

Yan Zhang and Barbara M. Wildemuth. 2009. Qualitative Analysis of Content. In Applications of Social Research Methods to Questions in Information and Library Science, B. Wildemuth (Ed.). 308–319.

[71]

Donghui Zhu and Ning Chen. 2022. Multi-Source Domain Adaptation and Fusion for Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2103–2116.

Digital Library

[72]

Asier López Zorrilla, María Inés Torres, and Heriberto Cuayáhuitl. 2022. Audio Embedding-Aware Dialogue Policy Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 525–538.

Digital Library

Index Terms

Speaking of accent: A content analysis of accent misconceptions in ASR research
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition
2. Human-centered computing
  1. Human computer interaction (HCI)
    1. HCI theory, concepts and models

Recommendations

Phoneme and tonal accent recognition for Thai speech
Highlights
► Phoneme recognition with soft phoneme segmentation procedure for Thai speech. ► Recognition system classifies phonemes using discrete hidden Markov models. ► MFPLP is better than MFCC as features in phoneme ...
Abstract
In this paper, we investigate the application of a phoneme recognition system with a soft phoneme segmentation procedure for Thai speech. In addition, we propose a new method to classify the tonal accent of a syllable. The recognition ...
An analysis of British regional accent and contextual cue effects on speechreading performance

The aim of this paper was to examine the effect of regional accent on speechreading accuracy and the utility of contextual cues in reducing accent effects. Study 1: Participants were recruited from Nottingham (n=24) and Glasgow (n=17). Their task was to ...
Accent neutralization for speech recognition of non-native speakers
iiWAS2019: Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services

These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency

June 2024

2580 pages

ISBN:9798400704505

DOI:10.1145/3630106

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FAccT '24

FAccT '24: The 2024 ACM Conference on Fairness, Accountability, and Transparency

June 3 - 6, 2024

Rio de Janeiro, Brazil

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
358
Total Downloads

Downloads (Last 12 months)358
Downloads (Last 6 weeks)93

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents