Computer Science > Computation and Language

arXiv:2310.03686 (cs)

[Submitted on 5 Oct 2023 (v1), last revised 3 Apr 2024 (this version, v2)]

Title:DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Authors:Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap Jumelet

Abstract:In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers, shedding new light on the information flow inside the encoder component of this important class of models.

Comments:	Accepted to Findings of NAACL 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2310.03686 [cs.CL]
	(or arXiv:2310.03686v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.03686

Submission history

From: Anna Langedijk [view email]
[v1] Thu, 5 Oct 2023 17:04:59 UTC (8,024 KB)
[v2] Wed, 3 Apr 2024 12:09:26 UTC (8,032 KB)

Computer Science > Computation and Language

Title:DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators