Computer Science > Computation and Language

arXiv:2311.12534 (cs)

[Submitted on 21 Nov 2023]

Title:Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks

Authors:Simone Filice, Jason Ingyu Choi, Giuseppe Castellucci, Eugene Agichtein, Oleg Rokhlenko

View PDF

Abstract:Many Natural Language Generation (NLG) tasks aim to generate a single output text given an input prompt. Other settings require the generation of multiple texts, e.g., for Synthetic Traffic Generation (STG). This generation task is crucial for training and evaluating QA systems as well as conversational agents, where the goal is to generate multiple questions or utterances resembling the linguistic variability of real users. In this paper, we show that common NLG metrics, like BLEU, are not suitable for evaluating STG. We propose and evaluate several metrics designed to compare the generated traffic to the distribution of real user texts. We validate our metrics with an automatic procedure to verify whether they capture different types of quality issues of generated data; we also run human annotations to verify the correlation with human judgements. Experiments on three tasks, i.e., Shopping Utterance Generation, Product Question Generation and Query Auto Completion, demonstrate that our metrics are effective for evaluating STG tasks, and improve the agreement with human judgement up to 20% with respect to common NLG metrics. We believe these findings can pave the way towards better solutions for estimating the representativeness of synthetic text data.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.12534 [cs.CL]
	(or arXiv:2311.12534v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.12534

Submission history

From: Simone Filice [view email]
[v1] Tue, 21 Nov 2023 11:26:26 UTC (738 KB)

Computer Science > Computation and Language

Title:Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluation Metrics of Language Generation Models for Synthetic Traffic Generation Tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators