From raw text to a working GPT. You'll implement every component of a transformer-based language model from the ground up.
How text becomes numbers for neural networks.
The big picture: how Large Language Models actually work.
How computers represent language (Unicode & UTF-8).
How BPE compresses text into the atoms of language models.
How to represent tokens as vectors to capture semantic meaning.
Teaching the model where each word lives in a sentence and why it's harder than you'd think.
Build the architecture that powers modern LLMs, then use it to build a translator.
The core engine of the Transformer: Query, Key, and Value.
Parallelizing attention and preventing cheating with causal masking.
Stabilizing training and adding the 'brain' to the block.
The gradient superhighway: enabling deep network training.
How the Decoder talks to the Encoder.
Build a complete Sequence-to-Sequence Transformer.
Go decoder-only and build a text-generating GPT.