French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English

Aurélie Névéol, Yoann Dupont, Julien Bezançon, Karën Fort

Abstract

Warning: This paper contains explicit statements of offensive stereotypes which may be upsetting. Much work on biases in natural language processing has addressed biases linked to the social and cultural experience of English speaking individuals in the United States. We seek to widen the scope of bias studies by creating material to measure social bias in language models (LMs) against specific demographic groups in France. We build on the US-centered CrowS-pairs dataset to create a multilingual stereotypes dataset that allows for comparability across languages while also characterizing biases that are specific to each country and language. We introduce 1,679 sentence pairs in French that cover stereotypes in ten types of bias like gender and age. 1,467 sentence pairs are translated from CrowS-pairs and 212 are newly crowdsourced. The sentence pairs contrast stereotypes concerning underadvantaged groups with the same sentence concerning advantaged groups. We find that four widely used language models (three French, one multilingual) favor sentences that express stereotypes in most bias categories. We report on the translation process from English into French, which led to a characterization of stereotypes in CrowS-pairs including the identification of US-centric cultural traits. We offer guidelines to further extend the dataset to other languages and cultural environments.

Anthology ID:: 2022.acl-long.583
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8521–8531
Language:
URL:: https://aclanthology.org/2022.acl-long.583/
DOI:: 10.18653/v1/2022.acl-long.583
Bibkey:
Cite (ACL):: Aurélie Névéol, Yoann Dupont, Julien Bezançon, and Karën Fort. 2022. French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8521–8531, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: French CrowS-Pairs: Extending a challenge dataset for measuring social bias in masked language models to a language other than English (Névéol et al., ACL 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.acl-long.583.pdf
Video:: https://aclanthology.org/2022.acl-long.583.mp4
Data: CrowS-Pairs

PDF Cite Search Video Fix data