It implements deep learning architecture and training logic without relying on NumPy, PyTorch, or any external libraries. Every operation—tensor arithmetic, backpropagation, attention, and optimization—is executed through hand-written, minimal Python logic. The result is a transparent and flexible platform for understanding transformers, automatic differentiation, and the inner workings of modern ML, without hiding behind black-box libraries.
Dolphin is a minimalist computing substrate for symbolic tensor operations and transformer-based neural networks. It is:
A symbolic autodiff engine:
- Includes a fully custom Tensor class with support for 1D, 2D, and 3D tensors
- Tracks computation graphs and gradients
- Implements backpropagation with minimal overhead
A pure Python transformer stack:
- Multi-head self-attention
- Layer normalization
- GELU activations
- Feedforward layers
- Residual and normalization paths
A zero-dependency machine learning system:
- No NumPy
- No Torch
- No Cython or compiled backend
- Compatible with vanilla Python
Dolphin is built to help researchers, students, and curious developers explore how large language models and attention mechanisms actually function, without abstractions.
The main Dolphin codebase is structured into several key modules:
The core of the symbolic engine.
- Defines the
Tensor
class. - Supports 1D, 2D, and 3D nested lists as raw data.
- Includes autograd functionality: backward propagation, gradient accumulation, and dependency tracking.
- Implements all core operations:
- Element-wise math: add, subtract, multiply, divide
- Reductions: sum, mean, max, var
- Matrix multiplication (
matmul
) for 2D and batched 3D inputs - Broadcasting support for scalar and tensor math
- Reshaping, flattening, and transposition
Implements the full transformer encoder stack.
-
MultiHeadSelfAttention
:- Implements scaled dot-product attention
- Splits and recombines heads manually
- Applies softmax and matmul using custom tensor logic
-
FeedForward
:- Two-layer MLP with GELU activation
- Fully connected layers using internal
matmul
and bias addition
-
TransformerEncoderBlock
:- Combines self-attention and feedforward blocks
- Includes LayerNorm and residual connections
Activation and loss functions.
softmax
: Written without NumPy. Normalizes logits across the last axis.gelu
: Implements the Gaussian Error Linear Unit using a tanh approximation.cross_entropy_loss
: Calculates negative log-likelihood between predictions and target labels.
Layer utilities and normalization.
Linear
: Simple dense layer (y = xW + b), reshaped manually to support batch inputs.LayerNorm
: Implements layer normalization without NumPy, applied over the last dimension.
Optimization algorithms for parameter updates.
SGD
: Standard stochastic gradient descent.Adam
: Adaptive Moment Estimation, written from scratch with moving averages and bias correction.Momentum
: Implements SGD with momentum using manual velocity tracking.
A complete example pipeline using Dolphin:
- Loads and tokenizes text from the Brown corpus using NLTK.
- Builds a vocabulary and transforms text into integer sequences.
- Initializes embedding weights manually.
- Feeds embedded tokens into a multi-layer transformer stack.
- Projects final token embeddings to a vocabulary-size output.
- Trains using backpropagation and Adam optimizer.
- Generates new sequences using a greedy sampling loop.
This file demonstrates that Dolphin can support:
- Text preprocessing
- Transformer-based sequence modeling
- End-to-end training without any library support
- Inference and autoregressive text generation
Dolphin was created to answer the question:
“What happens if we build a transformer from scratch, without any numerical libraries at all?”
The result is not a high-performance framework. It is a clear, complete, and expressive computational substrate that exposes the full mechanics of deep learning.
It is ideal for:
- Teaching and learning how transformers and autograd work
- Experimenting with symbolic tensor operations
- Exploring what minimal, library-free ML looks like
Python 3.7+
No external dependencies.
Dolphin is released under the MIT License.
-
Start with Tensors. This is the foundation for everything else Think of it like a lego set. Tensors are your bricks, they are the basic building blocks of everything else. They handle data, operations, gradients, scalar differentiation, and form the foundation for learning. This is where you implement autodiff (automatic differentiation) and backpropagation, which lets your model learn from mistakes. Then, computation graphs, they show how tensors are connected and how gradients flow backward through the network. This is especially important for Transformers, but that's the last step.
-
My build order tensor engine (w/ autograd, etc) --> activations --> loss --> optimizers --> layers --> attention and transformers --> then training and sampling (aka, putting it all together). Will be making a better doc to explain each part, and how each module builds on one another.
- Transformer using PyTorch | GeeksforGeeks
A walkthrough of how to implement a Transformer using PyTorch components. Great for understanding how encoder/decoder blocks fit together. - Attention is All You Need (paper)
The original paper that introduced the Transformer architecture. Dense but worth reading (line by line with code nearby). - Building a Transformer from Scratch: A Step-by-Step Guide
A walkthrough of Transformer internals, broken down piece by piece, no hidden libraries. A high level explanation. - Python Lessons: Introduction to Transformers
Breakdown of the transformer model with annotated code and explanations. - Transformers Explained Explains how transformers work (visually). This is part 1, there are a couple different articles here that are really helpful.
- karpathy/micrograd
A tiny scalar-valued autograd engine and neural net library. Teaches how autodiff works w/ a PyTorch style API - tinygrad/tinygrad
Like PyTorch but tiny (and readable). Implements both forward and backward passes, and can run on GPUs. - rsokl/MyGrad
Drop in automatic differentiation for NumPy.
- karpathy/minGPT
A minimal PyTorch reimplementation of OpenAI’s GPT architecture. - karpathy/nanoGPT
Simple repo to train/finetune GPT-style models, trains GPTs on single GPUs. - easy-tensorflow/easy-tensorflow Explains TensorFlow components and how they work.
- ShivamShrirao/dnn_from_scratch
Implements CNNs, GANs, and more using only NumPy/CuPy, for low-level deep learning system design. - Writing ML Optimizers from Scratch | Quassarian Viper
A walkthrough of optimizers like SGD, Adam, RMSProp, all coded from scratch in Python. - q-viper/ML-from-Basics
Minimal implementations of core ML algorithms like linear regression, SVMs, and more - Deep Learning from Scratch | Seth Weidman
Book that builds neural networks from first principles in Python, theory and code.