[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3600211.3604667acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article

Unmasking Nationality Bias: A Study of Human Perception of Nationalities in AI-Generated Articles

Published: 29 August 2023 Publication History

Abstract

We investigate the potential for nationality biases in natural language processing (NLP) models using human evaluation methods. Biased NLP models can perpetuate stereotypes and lead to algorithmic discrimination, posing a significant challenge to the fairness and justice of AI systems. Our study employs a two-step mixed-methods approach that includes both quantitative and qualitative analysis to identify and understand the impact of nationality bias in a text generation model. Through our human-centered quantitative analysis, we measure the extent of nationality bias in articles generated by AI sources. We then conduct open-ended interviews with participants, performing qualitative coding and thematic analysis to understand the implications of these biases on human readers. Our findings reveal that biased NLP models tend to replicate and amplify existing societal biases, which can translate to harm if used in a sociotechnical setting. The qualitative analysis from our interviews offers insights into the experience readers have when encountering such articles, highlighting the potential to shift a reader’s perception of a country. These findings emphasize the critical role of public perception in shaping AI’s impact on society and the need to correct biases in AI systems.

References

[1]
Abubakar Abid, Maheen Farooqi, and James Zou. 2021. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 298–306.
[2]
Muhammad Ali, Piotr Sapiezynski, Miranda Bogen, Aleksandra Korolova, Alan Mislove, and Aaron Rieke. 2019. Discrimination through optimization: How Facebook’s Ad delivery can lead to biased outcomes. Proceedings of the ACM on human-computer interaction 3, CSCW (2019), 1–30.
[3]
Hussam Alkaissi and Samy I McFarlane. 2023. Artificial Hallucinations in ChatGPT: Implications in Scientific Writing. Cureus 15, 2 (2023).
[4]
Joseph E Aoun. 2018. Optimism and Anxiety: Views on the Impact of Artificial Intelligence and Higher Education’s Response. Gallup Inc. Vol (2018).
[5]
Mahzarin R Banaji and Anthony G Greenwald. 2016. Blindspot: Hidden biases of good people. Bantam.
[6]
Julia Barnett and Nicholas Diakopoulos. 2022. Crowdsourcing Impacts: Exploring the Utility of Crowds for Anticipating Societal Impacts of Algorithmic Decision Making. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 56–67.
[7]
Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
[8]
Lindsay Blackwell, Jill Dimond, Sarita Schoenebeck, and Cliff Lampe. 2017. Classification and its consequences for online harassment: Design insights from heartmob. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (2017), 1–19.
[9]
Su Lin Blodgett, Solon Barocas, Hal Daumé III, and Hanna Wallach. 2020. Language (Technology) is Power: A Critical Survey of “Bias” in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5454–5476.
[10]
Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016).
[11]
Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum Thain, and Lucy Vasserman. 2019. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion proceedings of the 2019 world wide web conference. 491–500.
[12]
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77–91.
[13]
Aylin Caliskan, Pimparkar Parth Ajay, Tessa Charlesworth, Robert Wolfe, and Mahzarin R Banaji. 2022. Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 156–170.
[14]
Aylin Caliskan, Joanna J Bryson, and Arvind Narayanan. 2017. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186.
[15]
Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the impact of gender on rank in resume search engines. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–14.
[16]
Mark Davies. 2017. The new 4.3 billion word NOW corpus, with 4–5 million words of data added every day. In The 9th International Corpus Linguistics Conference.
[17]
Sunipa Dev, Emily Sheng, Jieyu Zhao, Aubrie Amstutz, Jiao Sun, Yu Hou, Mattie Sanseverino, Jiin Kim, Akihiro Nishi, Nanyun Peng, 2021. On Measures of Biases and Harms in NLP. arXiv preprint arXiv:2108.03362 (2021).
[18]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
[19]
Quan Do. 2019. Jigsaw Unintended Bias in Toxicity Classification. (2019).
[20]
Jade S Franklin, Karan Bhanot, Mohamed Ghalwash, Kristin P Bennett, Jamie McCusker, and Deborah L McGuinness. 2022. An Ontology for Fairness Metrics. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 265–275.
[21]
Batya Friedman and Helen Nissenbaum. 1996. Bias in computer systems. ACM Transactions on Information Systems (TOIS) 14, 3 (1996), 330–347.
[22]
Ismael Garrido-Muñoz, Arturo Montejo-Ráez, Fernando Martínez-Santiago, and L Alfonso Ureña-López. 2021. A survey on bias in deep nlp. Applied Sciences 11, 7 (2021), 3184.
[23]
Bhanu Prakash Reddy Guda, Aparna Garimella, and Niyati Chhaya. 2021. Empathbert: A bert-based framework for demographic-aware empathy prediction. arXiv preprint arXiv:2102.00272 (2021).
[24]
Shloak Gupta, S Bolden, Jay Kachhadia, A Korsunska, and J Stromer-Galley. 2020. PoliBERT: Classifying political social media messages with BERT. In Social, cultural and behavioral modeling (SBP-BRIMS 2020) conference. Washington, DC.
[25]
Vipul Gupta, Pranav Narayanan Venkit, Shomir Wilson, and Rebecca J Passonneau. 2023. Survey on Sociodemographic Bias in Natural Language Processing. arXiv preprint arXiv:2306.08158 (2023).
[26]
Nida Manzoor Hakak, Mohsin Mohd, Mahira Kirmani, and Mudasir Mohd. 2017. Emotion analysis: A survey. In 2017 international conference on computer, communications and electronics (COMPTELIX). IEEE, 397–402.
[27]
Foad Hamidi, Morgan Klaus Scheuerman, and Stacy M Branham. 2018. Gender recognition or gender reductionism? The social implications of embedded gender recognition systems. In Proceedings of the 2018 chi conference on human factors in computing systems. 1–13.
[28]
Anikó Hannák, Claudia Wagner, David Garcia, Alan Mislove, Markus Strohmaier, and Christo Wilson. 2017. Bias in online freelance marketplaces: Evidence from taskrabbit and fiverr. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing. 1914–1933.
[29]
Richard L Hense, Louis A Penner, and Douglas L Nelson. 1995. Implicit memory for age stereotypes. Social Cognition 13, 4 (1995), 399–415.
[30]
Dirk Hovy. 2015. Demographic factors improve classification performance. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing (volume 1: Long papers). 752–762.
[31]
Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, and Stephen Denuyl. 2020. Social Biases in NLP Models as Barriers for Persons with Disabilities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5491–5501.
[32]
Clayton Hutto and Eric Gilbert. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the international AAAI conference on web and social media, Vol. 8. 216–225.
[33]
Lilly C Irani and M Six Silberman. 2013. Turkopticon: Interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI conference on human factors in computing systems. 611–620.
[34]
Shivani Kapania, Oliver Siy, Gabe Clapper, Azhagu Meena SP, and Nithya Sambasivan. 2022. ” Because AI is 100% right and safe”: User Attitudes and Sources of AI Authority in India. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–18.
[35]
Matthew Kay, Cynthia Matuszek, and Sean A Munson. 2015. Unequal representation and gender stereotypes in image search results for occupations. In Proceedings of the 33rd annual acm conference on human factors in computing systems. 3819–3828.
[36]
Patrick Gage Kelley, Yongwei Yang, Courtney Heldreth, Christopher Moessner, Aaron Sedley, Andreas Kramm, David T Newman, and Allison Woodruff. 2021. Exciting, useful, worrying, futuristic: Public perception of artificial intelligence in 8 countries. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 627–637.
[37]
Brendan Kennedy, Xisen Jin, Aida Mostafazadeh Davani, Morteza Dehghani, and Xiang Ren. 2020. Contextualizing Hate Speech Classifiers with Post-hoc Explanation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 5435–5442.
[38]
Svetlana Kiritchenko and Saif M Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 (2018).
[39]
Sarah Kreps, R Miles McCain, and Miles Brundage. 2022. All the news that’s fit to fabricate: AI-generated text as a tool of media misinformation. Journal of experimental political science 9, 1 (2022), 104–117.
[40]
Juhi Kulshrestha, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P Gummadi, and Karrie Karahalios. 2017. Quantifying search bias: Investigating sources of bias for political searches in social media. In Proceedings of the 2017 acm conference on computer supported cooperative work and social computing. 417–432.
[41]
Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, and Yulia Tsvetkov. 2019. Measuring Bias in Contextualized Word Representations. In Proceedings of the First Workshop on Gender Bias in Natural Language Processing. 166–172.
[42]
Bing Liu 2010. Sentiment analysis and subjectivity.Handbook of natural language processing 2, 2010 (2010), 627–666.
[43]
Steven Loria. 2018. textblob Documentation. Release 0.15 2 (2018).
[44]
Arm Ltd.2017. AI Today, AI Tomorrow. Awareness and Anticipation of AI: A Global Perspective. https://www.arm.com/solutions/artificial-intelligence/survey
[45]
Luwei Rose Luqiu and Fan Yang. 2018. Islamophobia in China: news coverage, stereotypes, and Chinese Muslims’ perceptions of themselves and Islam. Asian Journal of Communication 28, 6 (2018), 598–619.
[46]
Milagros Miceli, Julian Posada, and Tianling Yang. 2022. Studying up machine learning data: Why talk about bias when we mean power?Proceedings of the ACM on Human-Computer Interaction 6, GROUP (2022), 1–14.
[47]
Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 5356–5371.
[48]
Lisa-Maria Neudert, Aleksi Knuutila, and Philip N Howard. 2020. Global attitudes towards AI, machine learning & automated decision making. Technical Report. Working paper 2020.10, Oxford Commission on AI & Good Governance.
[49]
Andrew Ng. 2023. The Batch: ChatGPT Mania!, Crypto Fiasco Defunds AI Safety, Alexa Tells Bedtime Stories. https://www.deeplearning.ai/the-batch/issue-174/
[50]
Cathy O’neil. 2017. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.
[51]
Nedjma Ousidhoum, Xinran Zhao, Tianqing Fang, Yangqiu Song, and Dit-Yan Yeung. 2021. Probing Toxic Content in Large Pre-Trained Language Models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 4262–4274.
[52]
Bo Pang, Lillian Lee, 2008. Opinion mining and sentiment analysis. Foundations and Trends® in information retrieval 2, 1–2 (2008), 1–135.
[53]
Charles W Perdue and Michael B Gurtman. 1990. Evidence for the automaticity of ageism. Journal of Experimental Social Psychology 26, 3 (1990), 199–216.
[54]
Vinodkumar Prabhakaran, Ben Hutchinson, and Margaret Mitchell. 2019. Perturbation Sensitivity Analysis to Detect Unintended Model Biases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 5740–5745.
[55]
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
[56]
William S Robinson. 1951. The logical structure of analytic induction. American Sociological Review 16, 6 (1951), 812–818.
[57]
Joel Ross, Lilly Irani, M Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers? Shifting demographics in Mechanical Turk. In CHI’10 extended abstracts on Human factors in computing systems. 2863–2872.
[58]
Niloufar Salehi, Lilly C Irani, Michael S Bernstein, Ali Alkhatib, Eva Ogbe, and Kristy Milland. 2015. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 1621–1630.
[59]
Salim Sazzed. 2021. A Hybrid Approach of Opinion Mining and Comparative Linguistic Analysis of Restaurant Reviews. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021). 1281–1288.
[60]
Aaron Smith. 2018. Public attitudes toward computer algorithms. (2018).
[61]
Anna Sotnikova, Yang Trista Cao, Hal Daumé III, and Rachel Rudinger. 2021. Analyzing stereotypes in generative text inference tasks. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 4052–4065.
[62]
Samuel J Stratton. 2021. Population research: convenience sampling strategies. Prehospital and disaster Medicine 36, 4 (2021), 373–374.
[63]
Pranav Narayanan Venkit, Sanjana Gautam, Ruchi Panchanadikar, Ting-Hao Huang, and Shomir Wilson. 2023. Nationality Bias in Text Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. 116–122.
[64]
Pranav Narayanan Venkit, Mukund Srinath, and Shomir Wilson. 2022. A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. In Proceedings of the 29th International Conference on Computational Linguistics. 1324–1332.
[65]
Pranav Narayanan Venkit and Shomir Wilson. 2021. Identification of bias against people with disabilities in sentiment analysis and toxicity detection models. arXiv preprint arXiv:2111.13259 (2021).
[66]
Mayur Wankhade, Annavarapu Chandra Sekhara Rao, and Chaitanya Kulkarni. 2022. A survey on sentiment analysis methods, applications, and challenges. Artificial Intelligence Review 55, 7 (2022), 5731–5780.
[67]
Meredith Whittaker, Meryl Alper, Cynthia L Bennett, Sara Hendren, Liz Kaziunas, Mara Mills, Meredith Ringel Morris, Joy Rankin, Emily Rogers, Marcel Salas, 2019. Disability, bias, and AI. AI Now Institute (2019).
[68]
WorldBank. 2015. Individuals using the internet (% of population) - united states. https://data.worldbank.org/indicator/IT.NET.USER.ZS?end=2017&locations=US&start=2015
[69]
Florian Znaniecki. 1934. The method of sociology. Farrar & Rinehart.

Cited By

View all
  • (2024)Large language models are geographically biasedProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693479(34654-34669)Online publication date: 21-Jul-2024
  • (2024)AI as a Citizen: Eliciting Smart City Future Stories through Human-AI Collaborative Fiction WritingProceedings of the 27th International Academic Mindtrek Conference10.1145/3681716.3681744(330-335)Online publication date: 8-Oct-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
AIES '23: Proceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society
August 2023
1026 pages
ISBN:9798400702310
DOI:10.1145/3600211
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ethics in AI
  2. HCI
  3. Nationality Bias
  4. Natural Language Processing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIES '23
Sponsor:
AIES '23: AAAI/ACM Conference on AI, Ethics, and Society
August 8 - 10, 2023
QC, Montr\'{e}al, Canada

Acceptance Rates

Overall Acceptance Rate 61 of 162 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)262
  • Downloads (Last 6 weeks)32
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Large language models are geographically biasedProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693479(34654-34669)Online publication date: 21-Jul-2024
  • (2024)AI as a Citizen: Eliciting Smart City Future Stories through Human-AI Collaborative Fiction WritingProceedings of the 27th International Academic Mindtrek Conference10.1145/3681716.3681744(330-335)Online publication date: 8-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media