From raw text to a working GPT. You'll implement every component of a transformer-based language model from the ground up.
How text becomes numbers for neural networks.
The big picture: how Large Language Models actually work.
How computers represent language (Unicode & UTF-8).
How BPE compresses text into efficient tokens.
How to represent tokens as vectors to capture semantic meaning.
How to encode sequence order into token embeddings.
Build the architecture that powers modern LLMs, then use it to build a translator.
The core engine of the Transformer: Query, Key, and Value.
Parallelizing attention and preventing cheating with causal masking.
Stabilizing training and adding the 'brain' to the block.
The gradient superhighway: enabling deep network training.
How the Decoder talks to the Encoder.
Build a complete Sequence-to-Sequence Transformer.
Go decoder-only and build a text-generating GPT.