Analysing Data-To-Text Generation Benchmarks

Abstract

A generation system can only be as good as the data it is trained on. In this short paper, we propose a methodology for analysing data-to-text corpora used for training Natural Language Generation (NLG) systems. We apply this methodology to three existing benchmarks. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text generators.

Anthology ID:: W17-3537
Volume:: Proceedings of the 10th International Conference on Natural Language Generation
Month:: September
Year:: 2017
Address:: Santiago de Compostela, Spain
Editors:: Jose M. Alonso, Alberto Bugarín, Ehud Reiter
Venue:: INLG
SIG:: SIGGEN
Publisher:: Association for Computational Linguistics
Note:
Pages:: 238–242
Language:
URL:: https://aclanthology.org/W17-3537
DOI:: 10.18653/v1/W17-3537
Bibkey:
Cite (ACL):: Laura Perez-Beltrachini and Claire Gardent. 2017. Analysing Data-To-Text Generation Benchmarks. In Proceedings of the 10th International Conference on Natural Language Generation, pages 238–242, Santiago de Compostela, Spain. Association for Computational Linguistics.
Cite (Informal):: Analysing Data-To-Text Generation Benchmarks (Perez-Beltrachini & Gardent, INLG 2017)
Copy Citation:
PDF:: https://aclanthology.org/W17-3537.pdf

PDF Cite Search