Learn AI by Building It

The earlier chapters built each component in isolation. This project assembles them into a single codebase that prepares data, trains a model, and generates text. Each stage builds directly on the previous one, so work through them in order.

Setup & Data

Download FineWeb-Edu shards, encode them to token IDs, and build the training batch pipeline.

GPT Architecture

Implement the GPT model and write the forward pass with cross-entropy loss.

Training Mechanics

Train GPT with AdamW, a learning-rate schedule, validation, and checkpointing.

Inference & Generation

Load a trained checkpoint, control the sampling, and generate text.

We will train GPT-2 Small, a ~124M parameter model. The training data is FineWeb-Edu, a filtered educational web corpus. Training at this scale requires an NVIDIA GPU. If you are on a CPU or Apple Silicon, the architecture chapter includes a lighter configuration with fewer layers and a narrower embedding. No other code changes needed.

By The End Of The Project

prepare_data.py tokenizes FineWeb-Edu shards into train.bin and val.bin
data.py serves aligned x / y batches from those files
model.py wraps the Transformer stack into a trainable GPT
train.py runs the training loop and saves checkpoints
generate.py loads a checkpoint and samples continuations from a prompt