Computer Science > Machine Learning

arXiv:2205.09869 (cs)

[Submitted on 19 May 2022]

Title:Transformer with Memory Replay

View PDF

Abstract:Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer. It has been successfully used in reinforcement learning and GANs due to better sample efficiency. In this paper, we propose \emph{Transformer with Memory Replay} (TMR), which integrates memory replay with transformer, making transformer more sample-efficient. Experiments on GLUE and SQuAD benchmark datasets show that Transformer with Memory Replay achieves at least $1\%$ point increase compared to the baseline transformer model when pretrained with the same number of examples. Further, by adopting a careful design that reduces the wall-clock time overhead of memory replay, we also empirically achieve a better runtime efficiency.

Comments:	Accepted to AAAI 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2205.09869 [cs.LG]
	(or arXiv:2205.09869v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2205.09869

Submission history

From: Rui Liu [view email]
[v1] Thu, 19 May 2022 21:27:36 UTC (332 KB)

Computer Science > Machine Learning

Title:Transformer with Memory Replay

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Transformer with Memory Replay

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators