You send a prompt to ChatGPT and seconds later, a response streams back word by word. What's actually happening inside? In this chapter, we'll see the full picture before diving into technical details. Understanding how the pieces fit together will make each component much easier to grasp as we build them.
The Core Idea
At its core, a Large Language Model does one thing: predict the next word. When you give it "The cat sat on the", the model looks at those words and outputs a probability for every possible next word in its vocabulary.
The model picks one word (usually sampling from the high-probability ones), appends it to the input, and repeats.
This loop continues until the response is complete. This simple mechanism, repeated many times during training on massive text data, produces behavior that looks remarkably like understanding.
The Full Pipeline
When you send a prompt to an LLM and receive a response, your text goes through a series of transformations.
Don't worry about understanding exactly how or why these steps happen right now. We will explore each one in detail later. For now, just focus on the high-level stages your text passes through.
Characters become numbers computers can process.
We group bytes into chunks so there's a shorter sequence to process.
Plain token IDs don't capture meaning. This step converts each token into a vector that represents what it means.
The Transformer reads context and predicts the next token.
Tokens convert back to text, looping until the response is complete.
Learning from Data
LLMs are built to predict the next word, but how do they get good at it? A freshly created model is just millions of random numbers. It knows nothing about language, grammar, or the world. If you asked it to complete "The cat sat on the", it might confidently answer "purple" or "seventeen".
This is where training comes in. We show the model enormous amounts of text and let it learn from its mistakes, over and over, until patterns emerge.
A training example from text data.
The model guesses the next word. At first, it's random (e.g., "banana").
The actual next word is "throne". We measure the error.
The model adjusts its internal numbers to make "throne" more likely next time.
By repeating this process billions of times on massive datasets, the model's parameters gradually tune themselves. Words with similar meanings align in the vector space, and the model learns to recognize complex patterns, grammar, and facts.
What You Will Build
In this course, we will build every piece of this pipeline from scratch. By the end, you will understand not just what an LLM does, but how and why each component exists.
Throughout the course, you'll implement what you learn through coding challenges at the end of each chapter. Challenges marked Browser run right here. Challenges marked Local need dependencies like PyTorch, so you'll implement them in your editor and test locally with our CLI tool.
Let's start with the very first step: how computers see text.