8000 GitHub - fotisk07/gpt2: Minimal implementation of the Transformer architecture on small datasets
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ gpt2 Public

Minimal implementation of the Transformer architecture on small datasets

Notifications You must be signed in to change notification settings

fotisk07/gpt2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPT-2 Based Text Generation

This repository contains an implementation of a GPT-2 inspired language model in PyTorch. The model is trained on character-level sequences and can generate text based on an input prompt. It includes a simple bigram model, a full GPT-2 model, and a training script for learning from a dataset.

Repository Structure

  • models.py: Defines the model architecture, including:
    • BigramModel: A simple bigram model for language modeling.
    • GPT2: A GPT-2 based transformer model.
    • CausalSelfAttention, FeedForward, and Block for the transformer layers.
  • gpt2.py: Training script for the GPT-2 model.
  • addition.py: Demonstrates model performance on numerical addition tasks.
  • requirements.txt: Lists dependencies required to run the code.
  • input.txt: Contains the Tiny Shakespeare dataset used for training.

Installation

Ensure you have Python 3.8+ and install dependencies using:

pip install -r requirements.txt

Usage

Training the GPT-2 Model

To train the GPT-2 model using the Tiny Shakespeare dataset:

python gpt2.py

The script automatically loads input.txt, preprocesses it, and trains the model.

Generating Text

Once trained, the model can generate text by providing an initial context:

import torch
from models import GPT2

device = "cuda" if torch.cuda.is_available() else "cpu"
model = GPT2(vocab_size, context_lenght, n_embds, n_layers, num_heads, device=device)
model.load_state_dict(torch.load("m.pt"))
model.to(device)

context = torch.zeros((1, 1), dtype=torch.long).to(device)
output = model.generate(context, max_new_tokens=500)
print(decode(output[0].tolist()))

Training on Addition Tasks

To train the model on numerical addition:

python addition.py

Results will be logged in Out.txt, showcasing the model's predictions on random addition problems.

Dataset: Tiny Shakespeare

The repository includes input.txt, a character-level dataset based on Tiny Shakespeare. If you want to download your own copy:

wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt

Replace input.txt with your own dataset if needed.

Notes

  • The model supports GPU acceleration using PyTorch.
  • You can modify hyperparameters such as context_length, num_heads, and num_layers in the scripts.
  • Results are saved and periodically logged for evaluation.
  • The project has mostly educational value and is not meant to be run in production.

About

Minimal implementation of the Transformer architecture on small datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0