Computer Science > Machine Learning

arXiv:2402.18819 (cs)

[Submitted on 29 Feb 2024 (v1), last revised 2 Aug 2024 (this version, v2)]

Title:Dual Operating Modes of In-Context Learning

View PDF

Abstract:In-context learning (ICL) exhibits dual operating modes: task learning, i.e., acquiring a new skill from in-context samples, and task retrieval, i.e., locating and activating a relevant pretrained skill. Recent theoretical work investigates various mathematical models to analyze ICL, but existing models explain only one operating mode at a time. We introduce a probabilistic model, with which one can explain the dual operating modes of ICL simultaneously. Focusing on in-context learning of linear functions, we extend existing models for pretraining data by introducing multiple task groups and task-dependent input distributions. We then analyze the behavior of the optimally pretrained model under the squared loss, i.e., the MMSE estimator of the label given in-context examples. Regarding pretraining task distribution as prior and in-context examples as the observation, we derive the closed-form expression of the task posterior distribution. With the closed-form expression, we obtain a quantitative understanding of the two operating modes of ICL. Furthermore, we shed light on an unexplained phenomenon observed in practice: under certain settings, the ICL risk initially increases and then decreases with more in-context examples. Our model offers a plausible explanation for this "early ascent" phenomenon: a limited number of in-context samples may lead to the retrieval of an incorrect skill, thereby increasing the risk, which will eventually diminish as task learning takes effect with more in-context samples. We also theoretically analyze ICL with biased labels, e.g., zero-shot ICL, where in-context examples are assigned random labels. Lastly, we validate our findings and predictions via experiments involving Transformers and large language models.

Comments:	54 pages, 23 figures
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2402.18819 [cs.LG]
	(or arXiv:2402.18819v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.18819

Submission history

From: Ziqian Lin [view email]
[v1] Thu, 29 Feb 2024 03:06:10 UTC (4,527 KB)
[v2] Fri, 2 Aug 2024 08:22:57 UTC (6,104 KB)

Computer Science > Machine Learning

Title:Dual Operating Modes of In-Context Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Dual Operating Modes of In-Context Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators