Computer Science > Machine Learning

arXiv:2406.05816 (cs)

[Submitted on 9 Jun 2024 (v1), last revised 10 Oct 2024 (this version, v3)]

Title:Attention as a Hypernetwork

Authors:Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan Pascanu

Abstract:Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during training but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is predictive of the subtasks the network performs on unseen task compositions revealing that latent codes acquired during training are reused to solve unseen problem instances. To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork generated linear value network nonlinear strengthens compositionality. We find that this modification improves compositional generalization on abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven Progressive Matrices human intelligence test which gives us precise control over the problem compositions encountered during training and evaluation. We demonstrate on this task how scaling model size and data enables compositional generalization in transformers and gives rise to a functionally structured latent space.

Comments:	Code available at this https URL
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2406.05816 [cs.LG]
	(or arXiv:2406.05816v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.05816

Submission history

From: Simon Schug [view email]
[v1] Sun, 9 Jun 2024 15:08:00 UTC (6,003 KB)
[v2] Fri, 21 Jun 2024 13:09:43 UTC (6,004 KB)
[v3] Thu, 10 Oct 2024 13:15:10 UTC (6,023 KB)

Computer Science > Machine Learning

Title:Attention as a Hypernetwork

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Attention as a Hypernetwork

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators