Build GPT from Scratch

From raw text to a working GPT. You'll implement every component of a transformer-based language model from the ground up.

chapters

problems

~120hours

Phase 1Foundations

The big picture: how Large Language Models actually work.

How computers represent language (Unicode & UTF-8).

How BPE compresses text into efficient tokens.

How to represent tokens as vectors to capture semantic meaning.

How to encode sequence order into token embeddings.

Phase 2The Transformer Core

How attention lets tokens communicate using Query, Key, and Value.

The limitation of single-head attention and how multiple heads overcome it.

How the feed-forward network transforms each token independently.

How residual connections and layer normalization enable deep networks.

How all components combine into the repeatable transformer block.

Phase 3Project GPT

Build GPT-2 Small end to end: prepare data, train the model, and generate text.