research-article

Quantifying Gender Bias in Different Corpora

Authors:

Marzieh Babaeianjelodar,

Stephen Lorenz,

Josh Gordon,

Jeanna Matthews,

Evan FreitagAuthors Info & Claims

WWW '20: Companion Proceedings of the Web Conference 2020

Pages 752 - 759

https://doi.org/10.1145/3366424.3383559

Published: 20 April 2020 Publication History

Get Access

Abstract

Word embedding models have been shown to be effective in performing a wide variety of Natural Language Processing (NLP) tasks such as identifying audiences for web advertisements, parsing resumés to select promising job candidates, and translating documents from one language to another. However, it has been demonstrated that NLP systems learn gender bias from the corpora of documents on which they are trained. It is increasingly common for pre-trained models to be used as a starting point for building applications in a wide range of areas including critical decision making applications. It is also very easy to use a pre-trained model as the basis for a new application without careful consideration of the original nature of the training set. In this paper, we quantify the degree to which gender bias differs with the corpora used for training. We look especially at the impact of starting with a pre-trained model and fine-tuning with additional data. Specifically, we calculate a measure of direct gender bias on several pre-trained models including BERT’s Wikipedia and Book corpus models as well as on several fine-tuned General Language Understanding Evaluation (GLUE) benchmarks. In addition, we evaluate the bias from several more extreme corpora including the Jigsaw identity toxic dataset that includes toxic speech biased against race, gender, religion, and disability and the RtGender dataset that includes speech specifically labelled by gender. Our results reveal that the direct gender bias of the Jigsaw toxic identity dataset is surprisingly close to that of the base pre-trained Google model, but the RtGender dataset has significantly higher direct gender bias than the base model. When the bias learned by an NLP system can vary significantly with the corpora used for training, it becomes important to consider and report these details, especially for use in critical decision-making applications.

References

[1]

Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. 2016. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems. 4349–4357.

Google Scholar

[2]

J. Dastin. 2018. Amazon scraps secret AI recruiting tool that showed bias against women. Reuters Business News. (2018). https://www.reuters.com/article/amazoncom-jobs-automation/rpt-insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSL2N1WP1RO

Google Scholar

[3]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

Google Scholar

[4]

Leora Morgenstern Ernest Davis and Charles Ortiz. [n.d.]. The Winograd Schema Challenge. https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html

Google Scholar

[5]

Hector J. Levesque, Ernest Davis, and Leora Morgenstern. 2012. The Winograd Schema Challenge(KR’12). AAAI Press, 552–561. http://dl.acm.org/citation.cfm?id=3031843.3031909

Google Scholar

[6]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111–3119.

Google Scholar

[7]

Eric Nalisnick, Bhaskar Mitra, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, 83–84.

Digital Library

Google Scholar

[8]

Nikita Nangia and Samuel R Bowman. 2019. Human vs. Muppet: A Conservative Estimate of Human Performance on the GLUE Benchmark. arXiv preprint arXiv:1905.10425(2019).

Google Scholar

[9]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250(2016).

Google Scholar

[10]

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning, Andrew Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing. 1631–1642.

Crossref

Google Scholar

[11]

Rob Voigt, David Jurgens, Vinodkumar Prabhakaran, Dan Jurafsky, and Yulia Tsvetkov. 2018. RtGender: A corpus for studying differential responses to gender. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018).

Google Scholar

[12]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461(2018).

Google Scholar

[13]

Alex Warstadt and Samuel R Bowman. 2019. Grammatical Analysis of Pretrained Sentence Encoders with Acceptability Judgments. arXiv preprint arXiv:1901.03438(2019).

Google Scholar

Cited By

View all

Wang BZhang RXue BZhao YYang LLiang H(2024)Automatically Distinguishing People’s Explicit and Implicit Attitude Bias by Bridging Psychological Measurements with Sentiment Analysis on Large CorporaApplied Sciences10.3390/app1410419114:10(4191)Online publication date: 15-May-2024
https://doi.org/10.3390/app14104191
Vignesh TSruthi PMohanraj TKumar GVarkale ASheela M(2024)A Way of Making Smart Health Through Collaborating Machine Learning with the Bibliometrics2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)10.1109/ICACITE60783.2024.10617395(209-214)Online publication date: 14-May-2024
https://doi.org/10.1109/ICACITE60783.2024.10617395
Garrido-Muñoz IMartínez-Santiago FMontejo-Ráez A(2024)MarIA and BETO are sexist: evaluating gender bias in large language models for SpanishLanguage Resources and Evaluation10.1007/s10579-023-09670-358:4(1387-1417)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10579-023-09670-3
Show More Cited By

Index Terms

Quantifying Gender Bias in Different Corpora
1. Applied computing
  1. Arts and humanities
    1. Language translation
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
    1. Learning paradigms

Index terms have been assigned to the content through auto-classification.

Recommendations

Theories of “Gender” in NLP Bias Research
FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

The rise of concern around Natural Language Processing (NLP) technologies containing and perpetuating social biases has led to a rich and rapidly growing area of research. Gender bias is one of the central biases being analyzed, but to date there is no ...
Gender Bias and Under-Representation in Natural Language Processing Across Human Languages
AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

Natural Language Processing (NLP) systems are at the heart of many critical automated decision-making systems making crucial recommendations about our future world. However, these systems reflect a wide range of biases, from gender bias to a bias in ...
Evaluation of Gender Bias in Amharic Word Embedding Model
Natural Language Processing and Information Systems
Abstract
Bias in natural language processing systems can perpetuate and exacerbate societal inequalities, reflecting and potentially amplifying existing biases in human language and culture. Amharic, as the official language of Ethiopia, holds cultural and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

WWW '20: Companion Proceedings of the Web Conference 2020

April 2020

854 pages

ISBN:9781450370240

DOI:10.1145/3366424

Editors:
Amal El Fallah Seghrouchni
Sorbonne University, France
,
Gita Sukthankar
University of Central Florida, United States
,
Tie-Yan Liu
Microsoft Research Asia, China
,
Maarten van Steen
University of Twente, Netherlands

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 April 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '20

Sponsor:

SIGWEB

WWW '20: The Web Conference 2020

April 20 - 24, 2020

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
1,143
Total Downloads

Downloads (Last 12 months)203
Downloads (Last 6 weeks)13

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Wang BZhang RXue BZhao YYang LLiang H(2024)Automatically Distinguishing People’s Explicit and Implicit Attitude Bias by Bridging Psychological Measurements with Sentiment Analysis on Large CorporaApplied Sciences10.3390/app1410419114:10(4191)Online publication date: 15-May-2024
https://doi.org/10.3390/app14104191
Vignesh TSruthi PMohanraj TKumar GVarkale ASheela M(2024)A Way of Making Smart Health Through Collaborating Machine Learning with the Bibliometrics2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE)10.1109/ICACITE60783.2024.10617395(209-214)Online publication date: 14-May-2024
https://doi.org/10.1109/ICACITE60783.2024.10617395
Garrido-Muñoz IMartínez-Santiago FMontejo-Ráez A(2024)MarIA and BETO are sexist: evaluating gender bias in large language models for SpanishLanguage Resources and Evaluation10.1007/s10579-023-09670-358:4(1387-1417)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/s10579-023-09670-3
Zhang CWu B(2023)Characterizing gender stereotypes in popular fiction: A machine learning approachOnline Journal of Communication and Media Technologies10.30935/ojcmt/1364413:4(e202349)Online publication date: 2023
https://doi.org/10.30935/ojcmt/13644
Tian XNunes BGrant KCasanova M(2023)Mitigating Bias in GLAM Search EnginesProceedings of the 34th ACM Conference on Hypertext and Social Media10.1145/3603163.3609043(1-5)Online publication date: 4-Sep-2023
https://dl.acm.org/doi/10.1145/3603163.3609043
Betti LAbrate CKaltenbrunner A(2023)Large scale analysis of gender bias and sexism in song lyricsEPJ Data Science10.1140/epjds/s13688-023-00384-812:1Online publication date: 20-Apr-2023
https://doi.org/10.1140/epjds/s13688-023-00384-8
Sulastri MAini Rakhmawati NIndraswari R(2023)Identifying Gender Bias in Online Crime News Indonesia Using Word Embedding2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation (ICAMIMIA)10.1109/ICAMIMIA60881.2023.10427911(774-778)Online publication date: 14-Nov-2023
https://doi.org/10.1109/ICAMIMIA60881.2023.10427911
Sharma HChoudhury TDutta EUpadhyay ASharma A(2023)Natural Language Processing Workload Optimization Using Container Based DeploymentEmerging Trends in Expert Applications and Security10.1007/978-981-99-1946-8_10(93-103)Online publication date: 30-Jun-2023
https://doi.org/10.1007/978-981-99-1946-8_10
Shrestha SDas S(2022)Exploring gender biases in ML and AI academic research through systematic literature reviewFrontiers in Artificial Intelligence10.3389/frai.2022.9768385Online publication date: 11-Oct-2022
https://doi.org/10.3389/frai.2022.976838
Sayenju SAygun RBoardman JDon DZhang YFranks BJohnston SLee GSullivan DModgil G(2022)Quantification and Mitigation of Directional Pairwise Class Confusion Bias in a Chatbot Intent Classification ModelInternational Journal of Semantic Computing10.1142/S1793351X2250004016:04(497-520)Online publication date: 8-Aug-2022
https://doi.org/10.1142/S1793351X22500040
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Cited By

Index Terms

Recommendations

Theories of “Gender” in NLP Bias Research

Gender Bias and Under-Representation in Natural Language Processing Across Human Languages

Evaluation of Gender Bias in Amharic Word Embedding Model

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations