Computer Science > Computation and Language

arXiv:2301.11916 (cs)

[Submitted on 27 Jan 2023 (v1), last revised 12 Feb 2024 (this version, v4)]

Title:Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Authors:Xinyi Wang, Wanrong Zhu, Michael Saxon, Mark Steyvers, William Yang Wang

Abstract:In recent years, pre-trained large language models (LLMs) have demonstrated remarkable efficiency in achieving an inference-time few-shot learning capability known as in-context learning. However, existing literature has highlighted the sensitivity of this capability to the selection of few-shot demonstrations. Current understandings of the underlying mechanisms by which this capability arises from regular language model pretraining objectives remain disconnected from the real-world LLMs. This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models. On this premise, we propose an algorithm to select optimal demonstrations from a set of annotated data with a small LM, and then directly generalize the selected demonstrations to larger LMs. We demonstrate significant improvement over baselines, averaged over eight GPT models on eight real-world text classification datasets. We also demonstrate the real-world usefulness of our algorithm on GSM8K, a math word problem dataset. Our empirical findings support our hypothesis that LLMs implicitly infer a latent variable containing task information.

Comments:	code at: this https URL Accepted to NeurIPS 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2301.11916 [cs.CL]
	(or arXiv:2301.11916v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2301.11916

Submission history

From: Xinyi Wang [view email]
[v1] Fri, 27 Jan 2023 18:59:01 UTC (974 KB)
[v2] Thu, 4 May 2023 15:09:50 UTC (1,044 KB)
[v3] Tue, 17 Oct 2023 11:24:33 UTC (1,080 KB)
[v4] Mon, 12 Feb 2024 23:09:40 UTC (1,080 KB)

Computer Science > Computation and Language

Title:Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators