[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3630106.3658969acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open access

Speaking of accent: A content analysis of accent misconceptions in ASR research

Published: 05 June 2024 Publication History

Abstract

Automatic speech recognition (ASR) researchers are working to address the differing transcription performance of ASR by accent or dialect. However, research often has a limited view of accent in ways that reproduce discrimination and limit the scope of potential solutions. In this paper we present a content analysis of 22 papers published in 2022 in top conferences and journals on the topic of accent and ASR. We report on how accent is sometimes mistakenly viewed as something some people don’t have; as having a default; and being an attribute only of the speaker, and not of the listener. We discuss the implications on research and provide recommendations to researchers who hope to reduce ASR biases by accent.

References

[1]
Sondes Abderrazek, Corinne Fredouille, Alain Ghio, Muriel Lalain, Christine Meunier, and Virginie Woisard. 2022. Interpreting Deep Representations of Phonetic Features via Neuro-Based Concept Detector: Application to Speech Disorders Due to Head and Neck Cancer. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 200–214.
[2]
Gloria Anzaldúa. 1987. Borderlands/la frontera: The new mestiza. Aunt Lute Books, Chapter How to Tame a Wild Tongue, 75–86.
[3]
John Baugh. 2018. Linguistics in pursuit of justice. Cambridge University Press.
[4]
Emily Bender. 2019. The# benderrule: On naming the languages we study and why it matters. The Gradient 14 (2019).
[5]
Emily M Bender. 2011. On achieving and evaluating language-independence in NLP. Linguistic Issues in Language Technology 6 (2011), 1–26.
[6]
Ruha Benjamin. 2023. Race after technology (1 ed.). Polity.
[7]
Mary Blair-Loy and Erin A Cech. 2022. Misconceiving merit: Paradoxes of excellence and devotion in academic science and engineering. University of Chicago Press.
[8]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of "Bias" in NLP. arxiv:2005.14050 [cs.CL]
[9]
Danielle Bragg, Abraham Glasser, Fyodor Minakov, Naomi Caselli, and William Thies. 2022. Exploring Collection of Sign Language Videos through Crowdsourcing. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–24.
[10]
Curtis Bunn. 2024. Lululemon founder’s remarks have some DEI experts calling for boycotts to combat ‘regressive values’. NBC News (6 Jan. 2024).
[11]
Joy Buolamwini. 2023. Unmasking AI: my mission to protect what is human in a world of machines. Random House.
[12]
May Pik Yu Chan, June Choe, Aini Li, Yiran Chen, Xin Gao, and Nicole Holliday. 2022. Training and typological bias in ASR performance for world Englishes. In Interspeech 2022. ISCA, 1273–1277. https://doi.org/10.21437/Interspeech.2022-10869
[13]
Shefali Chandra. 2012. The sexual life of English: Languages of caste and desire in colonial India. Duke University Press.
[14]
Elizabeth R Cole. 2009. Intersectionality and research in psychology. American Psychologist 64, 3 (2009), 170–180.
[15]
Kumari Devarajan. 2018. Ready For A Linguistic Controversy? Say ’Mmhmm’. https://www.npr.org/sections/codeswitch/2018/08/17/606002607/ready-for-a-linguistic-controversy-say-mhmm. Code Switch (17 Aug. 2018).
[16]
Alex DiChristofano, Henry Shuster, Shefali Chandra, and Neal Patwari. 2023. Performance disparities between accents in automatic speech recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. https://arxiv.org/abs/2208.01157
[17]
Catherine D’Ignazio and Lauren F Klein. 2023. Data feminism. MIT Press.
[18]
Zijian Ding, Jiawen Kang, Tinky Oi Ting Ho, Ka Ho Wong, Helene H Fung, Helen Meng, and Xiaojuan Ma. 2022. TalkTive: a conversational agent using backchannels to engage older adults in neurocognitive disorders screening. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
[19]
Nina Sun Eidsheim. 2023. Rewriting algorithms for just recognition. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press, 134–150.
[20]
Radhika Garg, Hua Cui, Spencer Seligson, Bo Zhang, Martin Porcheron, Leigh Clark, Benjamin R Cowan, and Erin Beneteau. 2022. The last decade of HCI research on children and voice-based conversational agents. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–19.
[21]
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
[22]
Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu, Xunying Liu, and Helen Meng. 2022. Speaker adaptation using spectro-temporal deep features for dysarthric and elderly speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2597–2611.
[23]
Shahram Ghorbani and John HL Hansen. 2022. Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 762–774.
[24]
Sandra Harding. 2001. Feminist standpoint epistemology. The gender and science reader (2001), 145–168.
[25]
Michael Harriot. 2023. Black AF History: The Un-Whitewashed Story of America. Dey Street Books.
[26]
Yvette R Harris and Valarie M Schroeder. 2013. Language deficits or differences: What we know about African American Vernacular English in the 21st century. International Education Studies 6, 4 (2013), 194–204.
[27]
Drew Harwell. 2018. The accent gap: We tested Amazon’s Alexa and Google’s Home to see how people with accents are getting left behind in the smart speaker revolution. The Washington Post (18 July 2018).
[28]
Monique M Hennink, Bonnie N Kaiser, and Vincent C Marconi. 2017. Code saturation versus meaning saturation: how many interviews are enough?Qualitative health research 27, 4 (2017), 591–608.
[29]
Lauren N Irwin. 2022. White Normativity: Tracing Historical and Contemporary (Re)Productions of Whiteness in Higher Education. In Critical Whiteness Praxis in Higher Education. 48–69.
[30]
Xiaofu Jin, Xiaozhu Hu, Xiaoying Wei, and Mingming Fan. 2022. Synapse: Interactive Guidance by Demonstration with Trial-and-Error Support for Older Adults to Use Smartphone Apps. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–24.
[31]
Alexander Johnson, Ruchao Fan, Robin Morris, and Abeer Alwan. 2022. LPC Augment: an LPC-based ASR Data Augmentation Algorithm for Low and Zero-Resource Children’s Dialects. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 8577–8581.
[32]
Os Keyes. 2018. The misgendering machines: Trans/HCI implications of automatic gender recognition. Proceedings of the ACM on Human-Computer Interaction 2 (2018), 1–22.
[33]
Anam Ahmad Khan, Joshua Newn, James Bailey, and Eduardo Velloso. 2022. Integrating Gaze and Speech for Enabling Implicit Interactions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–14.
[34]
Young-Ho Kim, Diana Chou, Bongshin Lee, Margaret Danilovich, Amanda Lazar, David E Conroy, Hernisa Kacorri, and Eun Kyoung Choe. 2022. Mymove: Facilitating older adults to collect in-situ activity labels on a smartwatch with speech. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–21.
[35]
Allison Koenecke, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R Rickford, Dan Jurafsky, and Sharad Goel. 2020. Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences 117, 14 (2020), 7684–7689.
[36]
Neeraj Kumar, Ankur Narang, and Brejesh Lall. 2022. Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 1679–1693.
[37]
Pratik Kumar, Vrunda N Sukhadia, and S Umesh. 2022. Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6887–6891.
[38]
Halcyon M. Lawrence. 2021. Siri disciplines. In Your Computer Is On Fire, Thomas S. Mullaney, Benjamin Peters, Mar Hicks, and Kavita Philip (Eds.). MIT Press, 179–198. https://mitpress.mit.edu/books/your-computer-fire
[39]
Franklin Mingzhe Li, Cheng Lu, Zhicong Lu, Patrick Carrington, and Khai N Truong. 2022. An exploration of captioning practices and challenges of individual content creators on YouTube for people with hearing impairments. Proceedings of the ACM on Human-Computer Interaction 6, CSCW1 (2022), 1–26.
[40]
Rosina Lippi-Green. 1997. English with an accent (1 ed.). Routledge.
[41]
Nina Markl. 2022. Language variation and algorithmic bias: understanding algorithmic bias in British English automatic speech recognition. In ACM Conference on Fairness, Accountability, and Transparency (FAccT 2022). 521–534.
[42]
Vikram C Mathad, Julie M Liss, Kathy Chapman, Nancy Scherer, and Visar Berisha. 2022. Consonant-vowel transition models based on deep learning for objective evaluation of articulation. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 86–95.
[43]
Mari J Matsuda. 1991. Voices of America: Accent, antidiscrimination law, and a jurisprudence for the last reconstruction. Yale Law Journal (1991), 1329–1407.
[44]
Peggy McIntosh. 1989. White privilege: Unpacking the invisible knapsack. Peace and Freedom Magazine (July/Aug 1989), 10–12.
[45]
Josh Meyer, Lindy Rauchenstein, Joshua D. Eisenberg, and Nicholas Howell. 2020. Artie Bias Corpus: An Open Dataset for Detecting Demographic Bias in Speech Applications. In Proceedings of the 12th Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 6462–6468. https://aclanthology.org/2020.lrec-1.796
[46]
Krystyn R Moon. 2005. Yellowface: creating the Chinese in American popular music and performance, 1850s-1920s. Rutgers University Press.
[47]
Wesley Morris. 2021. Music. In The 1619 Project, Nikole Hannah-Jones, Caitlin Roper, Ilena Silverman, and Jake Silverstein (Eds.). One World, New York, Chapter 14, 358–379.
[48]
Veena Naregal. 2001. Language Politics, Elites and the Public Sphere: Western India under Colonialism. Permanent Black, New Delhi, Delhi.
[49]
Julia Nee, Genevieve Macfarlane Smith, Alicia Sheares, and Ishita Rustagi. 2022. Linguistic justice as a framework for designing, developing, and managing natural language processing tools. Big Data & Society 9, 1 (2022), 20539517221090930.
[50]
Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, and Bin Yu. 2022. Towards Robust Waveform-Based Acoustic Models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 1977–1992.
[51]
Cathleen A. Power. 2023. Just Hit Me Already: Obscured Workplace Abuse and Discrimination. ADVANCE Journal 4, 1 (2023).
[52]
Yanmin Qian, Xun Gong, and Houjun Huang. 2022. Layer-wise fast adaptation for end-to-end multi-accent speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2842–2853.
[53]
Emilee Rader, Margaret Echelbarger, and Justine Cassell. 2011. Brick by brick: iterating interventions to bridge the achievement gap with virtual peers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2971–2974.
[54]
Valeria Ramírez-Castañeda. 2020. Disadvantages in preparing and publishing scientific papers caused by the dominance of the English language in science: The case of Colombian researchers in biological sciences. PloS one 15, 9 (2020), e0238372.
[55]
Pooja Rangan. 2023. From “Handicap” to Crip Curb Cut: Thinking Accent with Disability. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press, 54–72.
[56]
Thomas Reitmaier, Electra Wallington, Dani Kalarikalayil Raju, Ondrej Klejch, Jennifer Pearson, Matt Jones, Peter Bell, and Simon Robinson. 2022. Opportunities and challenges of automatic speech recognition systems for low-resource language speakers. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.
[57]
Abdolreza Sabzi Shahrebabaki, Giampiero Salvi, Torbjørn Svendsen, and Sabato Marco Siniscalchi. 2021. Acoustic-to-articulatory mapping with joint optimization of deep speech enhancement and articulatory inversion models. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2021), 135–147.
[58]
Claude E. Shannon. 1948. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (1948), 379–423. http://math.harvard.edu/ ctm/home/text/others/shannon/entropy/entropy.pdf
[59]
Tanmay Srivastava, Prerna Khanna, Shijia Pan, Phuc Nguyen, and Shubham Jain. 2022. MuteIt: Jaw Motion Based Unvoiced Command Recognition Using Earable. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–26.
[60]
Pavitra Sundar. 2023. Listening with an Accent – or How to Loeribari. In Thinking With an Accent, Pooja Rangan, Akshya Saxena, Ragini Tharoor Srinivasan, and Pavitra Sundar (Eds.). University of California Press.
[61]
Rachael Tatman. 2017. Gender and Dialect Bias in YouTube’s Automatic Captions. In Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. Association for Computational Linguistics, Valencia, Spain, 53–59. https://doi.org/10.18653/v1/W17-1606
[62]
Rachael Tatman and Conner Kasten. 2017. Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. In Proc. Interspeech 2017. 934–938. https://doi.org/10.21437/Interspeech.2017-1746
[63]
Erik R Thomas. 2004. Rural white Southern accents. Mouton de Gruyter Berlin), 300–324.
[64]
Martin J. Tobin and Amal Jubran. 2022. Pulse oximetry, racial bias and statistical bias. Annals of Intensive Care 12, 1 (Jan 2022), 1–2.
[65]
US Equal Employment Opportunity Commission (EEOC). [n. d.]. National Origin Discrimination. https://www.eeoc.gov/national-origin-discrimination. Accessed: 5 Dec. 2023.
[66]
Alicia Nicki Washington. 2020. When twice as good isn’t enough: The case for cultural competence in computing. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education. 213–219.
[67]
Jing Wei, Weiwei Jiang, Chaofan Wang, Difeng Yu, Jorge Goncalves, Tilman Dingler, and Vassilis Kostakos. 2022. Understanding How to Administer Voice Surveys through Smart Speakers. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–32.
[68]
Steven Weinberger. 2015. Speech accent archive. https://accent.gmu.edu/about.php. George Mason University.
[69]
Bin Wu, Sakriani Sakti, Jinsong Zhang, and Satoshi Nakamura. 2022. Modeling unsupervised empirical adaptation by DPGMM and DPGMM-RNN hybrid model to extract perceptual features for low-resource ASR. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 901–916.
[70]
Yan Zhang and Barbara M. Wildemuth. 2009. Qualitative Analysis of Content. In Applications of Social Research Methods to Questions in Information and Library Science, B. Wildemuth (Ed.). 308–319.
[71]
Donghui Zhu and Ning Chen. 2022. Multi-Source Domain Adaptation and Fusion for Speaker Verification. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2103–2116.
[72]
Asier López Zorrilla, María Inés Torres, and Heriberto Cuayáhuitl. 2022. Audio Embedding-Aware Dialogue Policy Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2022), 525–538.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
FAccT '24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency
June 2024
2580 pages
ISBN:9798400704505
DOI:10.1145/3630106
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2024

Check for updates

Author Tags

  1. AI fairness
  2. accent
  3. discrimination
  4. speech recognition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

FAccT '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 358
    Total Downloads
  • Downloads (Last 12 months)358
  • Downloads (Last 6 weeks)93
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media