Computer Science > Computation and Language

arXiv:2404.17475 (cs)

[Submitted on 26 Apr 2024 (v1), last revised 13 Aug 2024 (this version, v2)]

Title:CEval: A Benchmark for Evaluating Counterfactual Text Generation

Authors:Van Bach Nguyen, Jörg Schlötterer, Christin Seifert

Abstract:Counterfactual text generation aims to minimally change a text, such that it is classified differently. Judging advancements in method development for counterfactual text generation is hindered by a non-uniform usage of data sets and metrics in related work. We propose CEval, a benchmark for comparing counterfactual text generation methods. CEval unifies counterfactual and text quality metrics, includes common counterfactual datasets with human annotations, standard baselines (MICE, GDBA, CREST) and the open-source language model LLAMA-2. Our experiments found no perfect method for generating counterfactual text. Methods that excel at counterfactual metrics often produce lower-quality text while LLMs with simple prompts generate high-quality text but struggle with counterfactual criteria. By making CEval available as an open-source Python library, we encourage the community to contribute more methods and maintain consistent evaluation in future work.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2404.17475 [cs.CL]
	(or arXiv:2404.17475v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2404.17475
Journal reference:	INLG 2024

Submission history

From: Van Bach Nguyen [view email]
[v1] Fri, 26 Apr 2024 15:23:47 UTC (115 KB)
[v2] Tue, 13 Aug 2024 07:39:59 UTC (757 KB)

Computer Science > Computation and Language

Title:CEval: A Benchmark for Evaluating Counterfactual Text Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CEval: A Benchmark for Evaluating Counterfactual Text Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators