Simple LLM

Simple LLM is a project implementing a GPT-like Large Language Model (LLM) from scratch.

This implementation follows Andrej Karpathy's Neural Networks: Zero to Hero video series as a primary reference.

Getting Started

This project is tested with Python 3.11.11.

To try out the milestone test scripts below, first create a Python virtual environment, and install the dependencies:

python -m venv venv
source venv/bin/activate
pip install -v --upgrade pip
pip install -v -r requirements.txt

To test our own automated differentiation library:

python -m simplellm.autograd.test

To test our own neural network implementations:

python -m simplellm.nn.test

To test our language model implementations:

python -m simplellm.lm.test

Milestones

Automatic Differentiation

Goal: Implement an automatic differentiation library with back-propagation support. The library includes fundamental operations (+, *, ReLU) as building blocks for neural network construction.

Validation: Implementation tested through a linear regression (y = a*x + b) on a noiseless dataset using MSE loss. Results validated against ground truth and PyTorch implementation.

Status: Complete (see simplellm/autograd/)

Neural Network (Multi-layer Perceptron)

Goal: Implement neural network (MLP) and training algorithms in steps, gradually replace our implementation to that of PyTorch's:

Our automatic differentiation library from last stage + our SGD/Adam optimizers
PyTorch tensors + our SGD/Adam optimizers written in Pytorch
PyTorch NN modules + PyTorch's Adam optimizer

Validation: Implementations tested through quadratic regression (y = w1*x1**2 + w2*x2**2 + b) on a noiseless dataset using MSE loss and ReLU activation. Performance compared across all three implementations, with additional comparison between SGD and Adam optimizers.

Status: Complete (see simplellm/nn/)

Language Modeling

Goal: Implement and compare various character-level language models using PyTorch.

Model implementations:

Statistical n-gram models
1. Bi-gram and tri-gram models with counting-based approach (implemented)
2. Bi-gram model optimized through cross-entropy loss and gradient descent (implemented)
Neural architectures
1. N-gram model with character-level embeddings and dense MLP (implemented)
2. Recurrent Neural Network (Original RNN Cell, GRU cell) (implemented)
3. Transformer architecture (implemented)
  1. Causal self attention
  2. Multi-head attention
  3. Layer normalization
  4. Position encoding
4. Handcrafted network network model for a specific distribution, to explore the minimum NN that can express it (implemented for sticky)

The training data is synthesized with the following properties:

Rule-based generation for interpretability
Deterministic validation of distribution membership (Y/N) for any given sample
Configurable context length requirements with parameter N
- N determines minimum n-gram model size required for 100% accuracy

We have implemented 3 synthetic datasets/distributions. In the order of difficulty:

sticky.py: Generate a mix of lower case (a), upper case (A) and digit (0) characters, such that each class (a/A/0) will appear at least N times consecutively before switching to another class, e.g., 4C1D "acXYZ13a" (N=2)
counting.py: Generate variable length counting sequence (consecutive integers separated by comma) for numbers up to N digits, e.g., "32,33,34" (N=2)
arithmetic.py: Generate addition formulas for numbers up to N digits, e.g. "22+13=35" (N=2)

Evaluation:

Train all implemented models on the synthetic dataset
Generate new samples from trained models and LLMs prompted with the training samples
Validate generated samples against the rules and calculate in-distribution percentage
Analyze learned parameters to understand how different architectures capture the underlying distribution

Status: In Progress (see simplellm/lm/)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
simplellm		simplellm
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple LLM

Getting Started

Milestones

Automatic Differentiation

Neural Network (Multi-layer Perceptron)

Language Modeling

About

Releases

Packages

Languages

httpe/simple-llm

Folders and files

Latest commit

History

Repository files navigation

Simple LLM

Getting Started

Milestones

Automatic Differentiation

Neural Network (Multi-layer Perceptron)

Language Modeling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages