Learning Interpretable Style Embeddings via Prompting LLMs

Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, Chris Callison-Burch

Abstract

Style representation learning builds content-independent representations of author style in text. To date, no large dataset of texts with stylometric annotations on a wide range of style dimensions has been compiled, perhaps because the linguistic expertise to perform such annotation would be prohibitively expensive. Therefore, current style representation approaches make use of unsupervised neural methods to disentangle style from content to create style vectors. These approaches, however, result in uninterpretable representations, complicating their usage in downstream applications like authorship attribution where auditing and explainability is critical. In this work, we use prompting to perform stylometry on a large number of texts to generate a synthetic stylometry dataset. We use this synthetic data to then train human-interpretable style representations we call LISA embeddings. We release our synthetic dataset (StyleGenome) and our interpretable style embedding model (LISA) as resources.

Anthology ID:: 2023.findings-emnlp.1020
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 15270–15290
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.1020/
DOI:: 10.18653/v1/2023.findings-emnlp.1020
Bibkey:
Cite (ACL):: Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, and Chris Callison-Burch. 2023. Learning Interpretable Style Embeddings via Prompting LLMs. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15270–15290, Singapore. Association for Computational Linguistics.
Cite (Informal):: Learning Interpretable Style Embeddings via Prompting LLMs (Patel et al., Findings 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.findings-emnlp.1020.pdf

PDF Cite Search Fix data