Computer Science > Computation and Language

arXiv:2404.14397 (cs)

[Submitted on 22 Apr 2024 (v1), last revised 16 Dec 2024 (this version, v2)]

Title:RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Abstract:Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.

Comments:	AAAI 2025--camera ready + extended abstract
Subjects:	Computation and Language (cs.CL); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2404.14397 [cs.CL]
	(or arXiv:2404.14397v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.14397

Submission history

From: Adrian de Wynter [view email]
[v1] Mon, 22 Apr 2024 17:56:26 UTC (1,361 KB)
[v2] Mon, 16 Dec 2024 17:34:22 UTC (3,439 KB)

Computer Science > Computation and Language

Title:RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators