You type a question into ChatGPT. A few seconds later, a thoughtful response appears, word by word. It feels like magic, like the computer is thinking. But what's actually happening inside?
In this chapter, we will see the full picture of how LLMs work before diving into technical details. Understanding how all the pieces fit together will make each chapter that follows much easier to grasp.
The Core Idea
At its heart, a Large Language Model does one thing: predict the next word.
When you give it "The cat sat on the", the model looks at those words and outputs a probability for every possible next word in its vocabulary.
The model picks one word (usually sampling from the high-probability ones), appends it to the input, and repeats.
This loop continues until the response is complete. This simple mechanism, repeated many times during training on massive text data, produces behavior that looks remarkably like understanding.
The End-to-End Journey
When you send a message to an LLM and receive a response, your text goes through a series of transformations.
Don't worry about understanding exactly how or why these steps happen right now. We will explore each one in detail later. For now, just focus on the high-level stages your text passes through.
Characters become numbers computers can process.
We group bytes into chunks so there's a shorter sequence to process.
Plain numbers don't capture meaning. This step gives each chunk a position that represents what it means.
The Transformer reads context and predicts the next token.
Tokens convert back to text, looping until the response is complete.
Learning from Data
LLMs are built to predict the next word, but how do they get good at it? A freshly created model is just millions of random numbers. It knows nothing about language, grammar, or the world. If you asked it to complete "The cat sat on the", it might confidently answer "purple" or "seventeen".
The magic happens through training. We show the model enormous amounts of text and let it learn from its mistakes, over and over, until patterns emerge.
A training example from text data.
The model guesses the next word. At first, it's random (e.g., "banana").
The actual next word is "throne". We measure the error.
The model adjusts its internal numbers to make "throne" more likely next time.
By repeating this process billions of times on massive datasets, the model's parameters gradually tune themselves. Words with similar meanings align in the vector space, and the model learns to recognize complex patterns, grammar, and facts.
What You Will Build
In this course, we will build every piece of this pipeline from scratch. By the end, you will understand not just what an LLM does, but how and why each component exists.
Along the way, you'll implement what you learn through coding challenges at the end of each chapter. Challenges marked Browser run right here. Challenges marked Local need dependencies like PyTorch, so you'll implement them in your editor and test locally with our CLI tool.
Let's start with the very first step: how computers see text.