Computer Science > Computation and Language

arXiv:2209.10335 (cs)

[Submitted on 21 Sep 2022 (v1), last revised 22 Sep 2022 (this version, v2)]

Title:Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Authors:Thiemo Wambsganss, Vinitra Swamy, Roman Rietsche, Tanja Käser

View PDF

Abstract:Natural Language Processing (NLP) has become increasingly utilized to provide adaptivity in educational applications. However, recent research has highlighted a variety of biases in pre-trained language models. While existing studies investigate bias in different domains, they are limited in addressing fine-grained analysis on educational and multilingual corpora. In this work, we analyze bias across text and through multiple architectures on a corpus of 9,165 German peer-reviews collected from university students over five years. Notably, our corpus includes labels such as helpfulness, quality, and critical aspect ratings from the peer-review recipient as well as demographic attributes. We conduct a Word Embedding Association Test (WEAT) analysis on (1) our collected corpus in connection with the clustered labels, (2) the most common pre-trained German language models (T5, BERT, and GPT-2) and GloVe embeddings, and (3) the language models after fine-tuning on our collected data-set. In contrast to our initial expectations, we found that our collected corpus does not reveal many biases in the co-occurrence analysis or in the GloVe embeddings. However, the pre-trained German language models find substantial conceptual, racial, and gender bias and have significant changes in bias across conceptual and racial axes during fine-tuning on the peer-review data. With our research, we aim to contribute to the fourth UN sustainability goal (quality education) with a novel dataset, an understanding of biases in natural language education data, and the potential harms of not counteracting biases in language models for educational tasks.

Comments:	Accepted as a full paper at COLING 2022: The 29th International Conference on Computational Linguistics, 12-17 of October 2022, Gyeongju, Republic of Korea
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY)
Cite as:	arXiv:2209.10335 [cs.CL]
	(or arXiv:2209.10335v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2209.10335

Submission history

From: Vinitra Swamy [view email]
[v1] Wed, 21 Sep 2022 13:08:16 UTC (4,334 KB)
[v2] Thu, 22 Sep 2022 13:08:04 UTC (4,330 KB)

Computer Science > Computation and Language

Title:Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Bias at a Second Glance: A Deep Dive into Bias for German Educational Peer-Review Data Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators