LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study

Van Bach Nguyen, Paul Youssef, Christin Seifert, Jörg Schlötterer

Abstract

As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model’s prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for three tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI and Hate Speech (HS) where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLM CFs. Furthermore, we evaluate LLMs’ ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias, but it shows strong preference to its own generations. Our analysis suggests that safety training is causing GPT4 to prefer its generations, since these generations do not contain harmful content. Our findings reveal several limitations and point to potential future work directions.

Anthology ID:: 2024.findings-emnlp.870
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2024
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 14809–14824
Language:
URL:: https://aclanthology.org/2024.findings-emnlp.870
DOI:: 10.18653/v1/2024.findings-emnlp.870
Bibkey:
Cite (ACL):: Van Bach Nguyen, Paul Youssef, Christin Seifert, and Jörg Schlötterer. 2024. LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 14809–14824, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: LLMs for Generating and Evaluating Counterfactuals: A Comprehensive Study (Nguyen et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-emnlp.870.pdf

PDF Cite Search