Decoder-only Transformer

We train a transformer model on a subset of the TinyStories dataset, https://arxiv.org/abs/2305.07759. The transformer model is the composition of a token embedding, a self-attention head, and a two-layer FNN with GELU activation. We train it on the cross-entropy loss using Adam. This is a modification of the code in curtfox/decoder-memory-capacity for https://arxiv.org/abs/2405.13718. We take a subset of 100 stories for our training set and 100 stories for our test set, truncating each story to 100 words. The overall vocabulary ends up having size 1858. We vary the number of neurons in the first FNN layer from 10 to 90, setting the embedding dimension equal to it. For training, we take 1000 epochs of Adam using a full batch. We plot the final training error and final test error as functions of the number of parameters. While the final training error decreases monotonically as the number of parameters increases, the final test error decreases then increases, with a minimum at 158,698 parameters. This is consistent with the traditional bias-variance tradeoff, not the modern double descent curve. This suggests that, given a data set size, too many parameters and epochs will indeed lead to overfitting, so they must be increased jointly with the data set size in order to decrease the test loss.

Installing Required Packages

To install the required python packages, use the following command:

pip install -r requirements.txt

Running the Code

To run the code, use the following command:

python run_experiments.py

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
code		code
experiments		experiments
plots		plots
.flake8		.flake8
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
makefile		makefile
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Decoder-only Transformer

Installing Required Packages

Running the Code

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

liammadden/transformer-loss

Folders and files

Latest commit

History

Repository files navigation

Decoder-only Transformer

Installing Required Packages

Running the Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages