[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Mitigating Gender Bias in Machine Learning Data Sets

  • Conference paper
  • First Online:
Bias and Social Aspects in Search and Recommendation (BIAS 2020)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1245))

Abstract

Algorithmic bias has the capacity to amplify and perpetuate societal bias, and presents profound ethical implications for society. Gender bias in algorithms has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning, and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning. The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact in the context of search and recommender systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 35.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 44.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://open-platform.theguardian.com.

  2. 2.

    http://www.wjh.harvard.edu/inquirer.

  3. 3.

    https://www.tensorflow.org.

References

  1. Adams, R., Ni Loideain, N.: Addressing indirect discrimination and gender stereotypes in AI virtual personal assistants: the role of international human rights law. In: Annual Cambridge International Law Conference (2019)

    Google Scholar 

  2. Ali, O., Flaounas, I., De Bie, T., Mosdell, N., Lewis, J., Cristianini, N.: Automating news content analysis: an application to gender bias and readability. In: Proceedings of the First Workshop on Applications of Pattern Analysis, pp. 36–43 (2010)

    Google Scholar 

  3. Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias risk assessments in criminal sentencing (2016). ProPublica https://www.propublica.org

  4. Baker, P.: Sexed texts: language, gender and sexuality. Equinox (2008)

    Google Scholar 

  5. Bolukbasi, T., Chang, K.W., Zou, J.Y., Saligrama, V., Kalai, A.T.: Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In: Advances in Neural Information Processing Systems, pp. 4349–4357 (2016)

    Google Scholar 

  6. Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. In: Conference on Fairness, Accountability and Transparency, pp. 77–91 (2018)

    Google Scholar 

  7. Caldas-Coulthard, C.R., Moon, R.: ‘curvy, hunky, kinky’: using corpora as tools for critical analysis. Discourse Soc. 21(2), 99–133 (2010)

    Article  Google Scholar 

  8. Caliskan, A., Bryson, J.J., Narayanan, A.: Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183–186 (2017)

    Article  Google Scholar 

  9. Dixon, L., Li, J., Sorensen, J., Thain, N., Vasserman, L.: Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73. ACM (2018)

    Google Scholar 

  10. Frith, K., Shaw, P., Cheng, H.: The construction of beauty: a cross-cultural analysis of women’s magazine advertising. J. Commun. 55(1), 56–70 (2005)

    Article  Google Scholar 

  11. Garg, N., Schiebinger, L., Jurafsky, D., Zou, J.: Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc. Natl. Acad. Sci. 115(16), E3635–E3644 (2018)

    Article  Google Scholar 

  12. Gonen, H., Goldberg, Y.: Lipstick on a pig: debiasing methods cover up systematic gender biases in word embeddings but do not remove them. arXiv preprint arXiv:1903.03862 (2019)

  13. Ingham, P.: Language of gender and class: transformation in the Victorian Novel. Routledge, London (2002)

    Book  Google Scholar 

  14. Lambrecht, A., Tucker, C.: Algorithmic bias? an empirical study of apparent gender-based discrimination in the display of stem career ads. Management Science (2019)

    Google Scholar 

  15. Leavy, S., Meaney, G., Wade, K., Greene, D.: Curatr: a platform for semantic analysis and curation of historical literary texts. In: Garoufallou, E., Fallucchi, F., William De Luca, E. (eds.) MTSR 2019. CCIS, vol. 1057, pp. 354–366. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36599-8_31

    Chapter  Google Scholar 

  16. Martyna, W.: What does ‘he’ mean? use of the generic masculine. J. Commun. 28(1), 131–138 (1978)

    Article  Google Scholar 

  17. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  18. Mills, S.: Feminist Stylistics. Routledge, London (1995)

    Google Scholar 

  19. Mollin, S.: Revisiting binomial order in english: ordering constraints and reversibility. Engl. Lang. Linguist. 16(1), 81–103 (2012)

    Article  Google Scholar 

  20. Motschenbacher, H.: Gentlemen before ladies? a corpus-based study of conjunct order in personal binomials. J. Engl. Linguist. 41(3), 212–242 (2013)

    Article  Google Scholar 

  21. Pearce, M.: Investigating the collocational behaviour of man and woman in the BNC using sketch engine. Corpora 3(1), 1–29 (2008)

    Article  MathSciNet  Google Scholar 

  22. Perez, C.C.: Invisible Women: Data Bias in a World Designed for Men. Abrams (2019)

    Google Scholar 

  23. Romaine, S., et al.: Communicating Gender. Psychology Press, New York (1998)

    Book  Google Scholar 

  24. Shor, E., van de Rijt, A., Ward, C., Blank-Gomel, A., Skiena, S.: Time trends in printed news coverage of female subjects, 1880–2008. Journalism Stud. 15(6), 759–773 (2014)

    Article  Google Scholar 

  25. Sigley, R., Holmes, J.: Looking at girls in Corpora of English. J. Engl. Linguist. 30(2), 138–157 (2002)

    Article  Google Scholar 

  26. Swinger, N., De-Arteaga, M., Heffernan IV, N.T., Leiserson, M.D., Kalai, A.T.: What are the biases in my word embedding? In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp. 305–311. ACM (2019)

    Google Scholar 

  27. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1555–1565 (2014)

    Google Scholar 

  28. Vefali, G.M., Erdentuğ, F.: The coordinate structures in a corpus of new age talks: ‘man and woman’/‘woman and man’. Text Talk-An Interdisc. J. Lang. Discourse Commun. Stud. 30(4), 465–484 (2010)

    Article  Google Scholar 

  29. Zhang, B.H., Lemoine, B., Mitchell, M.: Mitigating unwanted biases with adversarial learning. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 335–340. ACM (2018)

    Google Scholar 

  30. Zhao, J., Wang, T., Yatskar, M., Ordonez, V., Chang, K.W.: Gender bias in coreference resolution: evaluation and debiasing methods. arXiv preprint arXiv:1804.06876 (2018)

Download references

Acknowledgements

This research project was supported by the Irish Research Council (IRC) and Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Susan Leavy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Leavy, S., Meaney, G., Wade, K., Greene, D. (2020). Mitigating Gender Bias in Machine Learning Data Sets. In: Boratto, L., Faralli, S., Marras, M., Stilo, G. (eds) Bias and Social Aspects in Search and Recommendation. BIAS 2020. Communications in Computer and Information Science, vol 1245. Springer, Cham. https://doi.org/10.1007/978-3-030-52485-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-52485-2_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-52484-5

  • Online ISBN: 978-3-030-52485-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics