Build ChatGPT from Scratch

From raw text to a working GPT. You'll implement every component of a transformer-based language model from the ground up.

13 chapters

~120 hours

Foundations

How text becomes numbers for neural networks.

The big picture: how Large Language Models actually work.

How computers represent language (Unicode & UTF-8).

How BPE compresses text into the atoms of language models.

How to represent tokens as vectors to capture semantic meaning.

Teaching the model where each word lives in a sentence and why it's harder than you'd think.

Build the architecture that powers modern LLMs, then use it to build a translator.

The core engine of the Transformer: Query, Key, and Value.

Parallelizing attention and preventing cheating with causal masking.

Stabilizing training and adding the 'brain' to the block.

The gradient superhighway: enabling deep network training.

How the Decoder talks to the Encoder.

Build a complete Sequence-to-Sequence Transformer.

Go decoder-only and build a text-generating GPT.

Why we drop the Encoder for generative tasks.

Build, Train, and Chat with a real Language Model.