Computer Science > Computation and Language

arXiv:2402.07767 (cs)

[Submitted on 12 Feb 2024 (v1), last revised 9 Jun 2024 (this version, v2)]

Title:Text Detoxification as Style Transfer in English and Hindi

Authors:Sourabrata Mukherjee, Akanksha Bansal, Atul Kr. Ojha, John P. McCrae, Ondřej Dušek

View PDF

Abstract:This paper focuses on text detoxification, i.e., automatically converting toxic text into non-toxic text. This task contributes to safer and more respectful online communication and can be considered a Text Style Transfer (TST) task, where the text style changes while its content is preserved. We present three approaches: knowledge transfer from a similar task, multi-task learning approach, combining sequence-to-sequence modeling with various toxicity classification tasks, and delete and reconstruct approach. To support our research, we utilize a dataset provided by Dementieva et al.(2021), which contains multiple versions of detoxified texts corresponding to toxic texts. In our experiments, we selected the best variants through expert human annotators, creating a dataset where each toxic sentence is paired with a single, appropriate detoxified version. Additionally, we introduced a small Hindi parallel dataset, aligning with a part of the English dataset, suitable for evaluation purposes. Our results demonstrate that our approach effectively balances text detoxication while preserving the actual content and maintaining fluency.

Comments:	Accepted and presented at the 20th International Conference on Natural Language Processing (ICON-2023) during December 14-17, 2023
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.07767 [cs.CL]
	(or arXiv:2402.07767v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.07767

Submission history

From: Sourabrata Mukherjee [view email]
[v1] Mon, 12 Feb 2024 16:30:41 UTC (194 KB)
[v2] Sun, 9 Jun 2024 18:48:06 UTC (174 KB)

Computer Science > Computation and Language

Title:Text Detoxification as Style Transfer in English and Hindi

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Text Detoxification as Style Transfer in English and Hindi

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators