Mitigating Gender Bias in Machine Learning Data Sets

Susan Leavy¹⁰,
Gerardine Meaney¹⁰,
Karen Wade¹⁰ &
…
Derek Greene¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1245))

Included in the following conference series:

International Workshop on Algorithmic Bias in Search and Recommendation

2480 Accesses
34 Citations

Abstract

Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 35.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 44.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bias Unveiled: Enhancing Fairness in German Word Embeddings with Large Language Models

Gender Bias in Neural Natural Language Processing

Exploring the Impact of Gender Bias Mitigation Approaches on a Downstream Classification Task

Notes

References

Adams, R., Ni Loideain, N.: Addressing indirect discrimination and gender stereotypes in AI virtual personal assistants: the role of international human rights law. In: Annual Cambridge International Law Conference (2019)
Google Scholar
Ali, O., Flaounas, I., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Automating news content analysis: an application to gender bias and readability. In: Proceedings of the First Workshop on Applications of Pattern Analysis, pp. 36–43 (2010)
Google Scholar
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias risk assessments in criminal sentencing (2016). ProPublica https://www.propublica.org
Baker, P.: Sexed texts: language, gender and sexuality. Equinox (2008)
Google Scholar
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)
Google Scholar
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)
Google Scholar
Caldas-Coulthard, C.R., Moon, R.: ‘curvy, hunky, kinky’: using corpora as tools for critical analysis. Discourse Soc. 21(2), 99–133 (2010)
Article Google Scholar
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)
Article Google Scholar
Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73. ACM (2018)
Google Scholar
Frith, K., Shaw, P., Cheng, H.: The construction of beauty: a cross-cultural analysis of women’s magazine advertising. J. Commun. 55(1), 56–70 (2005)
Article Google Scholar
Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. 115(16), E3635–E3644 (2018)
Article Google Scholar
Gonen, H., Goldberg, Y.: Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862 (2019)
Ingham, P.: Language of gender and class: transformation in the Victorian Novel. Routledge, London (2002)
Book Google Scholar
Lambrecht, A., Tucker, C.: Algorithmic bias? an empirical study of apparent gender-based discrimination in the display of stem career ads. Management Science (2019)
Google Scholar
Leavy, S., Meaney, G., Wade, K., Greene, D.: Curatr: a platform for semantic analysis and curation of historical literary texts. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds.) MTSR 2019. CCIS, vol. 1057, pp. 354–366. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36599-8_31
Chapter Google Scholar
Martyna, W.: What does ‘he’ mean? use of the generic masculine. J. Commun. 28(1), 131–138 (1978)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mills, S.: Feminist Stylistics. Routledge, London (1995)
Google Scholar
Mollin, S.: Revisiting binomial order in english: ordering constraints and reversibility. Engl. Lang. Linguist. 16(1), 81–103 (2012)
Article Google Scholar
Motschenbacher, H.: Gentlemen before ladies? a corpus-based study of conjunct order in personal binomials. J. Engl. Linguist. 41(3), 212–242 (2013)
Article Google Scholar
Pearce, M.: Investigating the collocational behaviour of man and woman in the BNC using sketch engine. Corpora 3(1), 1–29 (2008)
Article MathSciNet Google Scholar
Perez, C.C.: Invisible Women: Data Bias in a World Designed for Men. Abrams (2019)
Google Scholar
Romaine, S., et al.: Communicating Gender. Psychology Press, New York (1998)
Book Google Scholar
Shor, E., van de Rijt, A., Ward, C., Blank-Gomel, A., Skiena, S.: Time trends in printed news coverage of female subjects, 1880–2008. Journalism Stud. 15(6), 759–773 (2014)
Article Google Scholar
Sigley, R., Holmes, J.: Looking at girls in Corpora of English. J. Engl. Linguist. 30(2), 138–157 (2002)
Article Google Scholar
Swinger, N., De-Arteaga, M., Heffernan IV, N.T., Leiserson, M.D., Kalai, A.T.: What are the biases in my word embedding? In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 305–311. ACM (2019)
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555–1565 (2014)
Google Scholar
Vefali, G.M., Erdentuğ, F.: The coordinate structures in a corpus of new age talks: ‘man and woman’/‘woman and man’. Text Talk-An Interdisc. J. Lang. Discourse Commun. Stud. 30(4), 465–484 (2010)
Article Google Scholar
Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340. ACM (2018)
Google Scholar
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018)

Download references

Acknowledgements

This research project was supported by the Irish Research Council (IRC) and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2.

Author information

Authors and Affiliations

University College Dublin, Dublin, Ireland
Susan Leavy, Gerardine Meaney, Karen Wade & Derek Greene

Authors

Susan Leavy
View author publications
You can also search for this author in PubMed Google Scholar
Gerardine Meaney
View author publications
You can also search for this author in PubMed Google Scholar
Karen Wade
View author publications
You can also search for this author in PubMed Google Scholar
Derek Greene
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Susan Leavy .

Editor information

Editors and Affiliations

Eurecat - Centre Tecnològic de Catalunya, Barcelona, Spain
Ludovico Boratto
Unitelma Sapienza University of Rome, Rome, Italy
Stefano Faralli
University of Cagliari, Cagliari, Italy
Mirko Marras
University of L’Aquila, L’Aquila, Italy
Giovanni Stilo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leavy, S., Meaney, G., Wade, K., Greene, D. (2020). Mitigating Gender Bias in Machine Learning Data Sets. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds) Bias and Social Aspects in Search and Recommendation. BIAS 2020. Communications in Computer and Information Science, vol 1245. Springer, Cham. https://doi.org/10.1007/978-3-030-52485-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-52485-2_2
Published: 12 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52484-5
Online ISBN: 978-3-030-52485-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics