Computer Science > Computation and Language

arXiv:2305.14307 (cs)

[Submitted on 23 May 2023]

Title:Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Authors:Robert Morabito, Jad Kabbara, Ali Emami

View PDF

Abstract:Debiasing methods that seek to mitigate the tendency of Language Models (LMs) to occasionally output toxic or inappropriate text have recently gained traction. In this paper, we propose a standardized protocol which distinguishes methods that yield not only desirable results, but are also consistent with their mechanisms and specifications. For example, we ask, given a debiasing method that is developed to reduce toxicity in LMs, if the definition of toxicity used by the debiasing method is reversed, would the debiasing results also be reversed? We used such considerations to devise three criteria for our new protocol: Specification Polarity, Specification Importance, and Domain Transferability. As a case study, we apply our protocol to a popular debiasing method, Self-Debiasing, and compare it to one we propose, called Instructive Debiasing, and demonstrate that consistency is as important an aspect to debiasing viability as is simply a desirable result. We show that our protocol provides essential insights into the generalizability and interpretability of debiasing methods that may otherwise go overlooked.

Comments:	9 pages (excluding references), accepted at ACL Findings 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.14307 [cs.CL]
	(or arXiv:2305.14307v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.14307

Submission history

From: Robert Morabito [view email]
[v1] Tue, 23 May 2023 17:45:54 UTC (6,910 KB)

Computer Science > Computation and Language

Title:Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators