Back to Roadmap10

Project: Building GPT

Build GPT-2 Small end to end: prepare data, train the model, and generate text.

The earlier chapters built each component in isolation. This project assembles them into a single codebase that prepares data, trains a model, and generates text. Each stage builds directly on the previous one, so work through them in order.

We will train GPT-2 Small: 12 layers, 12 attention heads, 768-dimensional embeddings, roughly 124 million parameters. The training data is FineWeb-Edu, a high-quality filtered web corpus. Training at this scale requires a CUDA GPU. If you are on a CPU or Apple Silicon, the architecture chapter includes a smaller configuration with fewer layers and a narrower embedding that keeps all the code identical.

By The End Of The Project
  • prepare_data.py tokenizes FineWeb-Edu shards into train.bin and val.bin
  • data.py serves aligned x / y batches from those files
  • model.py wraps the Transformer stack into a trainable GPT
  • train.py runs the training loop and saves checkpoints
  • generate.py loads a checkpoint and samples continuations from a prompt