Abstract
Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Adams, R., Ni Loideain, N.: Addressing indirect discrimination and gender stereotypes in AI virtual personal assistants: the role of international human rights law. In: Annual Cambridge International Law Conference (2019)
Ali, O., Flaounas, I., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Automating news content analysis: an application to gender bias and readability. In: Proceedings of the First Workshop on Applications of Pattern Analysis, pp. 36–43 (2010)
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias risk assessments in criminal sentencing (2016). ProPublica https://www.propublica.org
Baker, P.: Sexed texts: language, gender and sexuality. Equinox (2008)
Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)
Caldas-Coulthard, C.R., Moon, R.: ‘curvy, hunky, kinky’: using corpora as tools for critical analysis. Discourse Soc. 21(2), 99–133 (2010)
Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)
Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73. ACM (2018)
Frith, K., Shaw, P., Cheng, H.: The construction of beauty: a cross-cultural analysis of women’s magazine advertising. J. Commun. 55(1), 56–70 (2005)
Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. 115(16), E3635–E3644 (2018)
Gonen, H., Goldberg, Y.: Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862 (2019)
Ingham, P.: Language of gender and class: transformation in the Victorian Novel. Routledge, London (2002)
Lambrecht, A., Tucker, C.: Algorithmic bias? an empirical study of apparent gender-based discrimination in the display of stem career ads. Management Science (2019)
Leavy, S., Meaney, G., Wade, K., Greene, D.: Curatr: a platform for semantic analysis and curation of historical literary texts. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds.) MTSR 2019. CCIS, vol. 1057, pp. 354–366. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36599-8_31
Martyna, W.: What does ‘he’ mean? use of the generic masculine. J. Commun. 28(1), 131–138 (1978)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Mills, S.: Feminist Stylistics. Routledge, London (1995)
Mollin, S.: Revisiting binomial order in english: ordering constraints and reversibility. Engl. Lang. Linguist. 16(1), 81–103 (2012)
Motschenbacher, H.: Gentlemen before ladies? a corpus-based study of conjunct order in personal binomials. J. Engl. Linguist. 41(3), 212–242 (2013)
Pearce, M.: Investigating the collocational behaviour of man and woman in the BNC using sketch engine. Corpora 3(1), 1–29 (2008)
Perez, C.C.: Invisible Women: Data Bias in a World Designed for Men. Abrams (2019)
Romaine, S., et al.: Communicating Gender. Psychology Press, New York (1998)
Shor, E., van de Rijt, A., Ward, C., Blank-Gomel, A., Skiena, S.: Time trends in printed news coverage of female subjects, 1880–2008. Journalism Stud. 15(6), 759–773 (2014)
Sigley, R., Holmes, J.: Looking at girls in Corpora of English. J. Engl. Linguist. 30(2), 138–157 (2002)
Swinger, N., De-Arteaga, M., Heffernan IV, N.T., Leiserson, M.D., Kalai, A.T.: What are the biases in my word embedding? In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 305–311. ACM (2019)
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555–1565 (2014)
Vefali, G.M., Erdentuğ, F.: The coordinate structures in a corpus of new age talks: ‘man and woman’/‘woman and man’. Text Talk-An Interdisc. J. Lang. Discourse Commun. Stud. 30(4), 465–484 (2010)
Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340. ACM (2018)
Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018)
Acknowledgements
This research project was supported by the Irish Research Council (IRC) and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Leavy, S., Meaney, G., Wade, K., Greene, D. (2020). Mitigating Gender Bias in Machine Learning Data Sets. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds) Bias and Social Aspects in Search and Recommendation. BIAS 2020. Communications in Computer and Information Science, vol 1245. Springer, Cham. https://doi.org/10.1007/978-3-030-52485-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-52485-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-52484-5
Online ISBN: 978-3-030-52485-2
eBook Packages: Computer ScienceComputer Science (R0)