Introduction

The big picture: how Large Language Models actually work.

You type a question into ChatGPT. A few seconds later, a thoughtful response appears, word by word. It feels like magic, like the computer is thinking. But what's actually happening inside?

In this chapter, we will see the full picture of how LLMs work before diving into technical details. Understanding how all the pieces fit together will make each chapter that follows much easier to grasp.

The Core Idea

At its heart, a Large Language Model does one thing: predict the next word.

When you give it "The cat sat on the", the model looks at those words and outputs a probability for every possible next word in its vocabulary.

Iteration 1
Thecatsatonthe
mat (15%)floor (12%)roof (3%)...

The model picks one word (usually sampling from the high-probability ones), appends it to the input, and repeats.

Iteration 2
Thecatsatonthemat
and (18%). (14%)while (8%)...

This loop continues until the response is complete. This simple mechanism, repeated many times during training on massive text data, produces behavior that looks remarkably like understanding.

The End-to-End Journey

When you send a message to an LLM and receive a response, your text goes through a series of transformations.

Don't worry about understanding exactly how or why these steps happen right now. We will explore each one in detail later. For now, just focus on the high-level stages your text passes through.

1
Text → Bytes
"Hello" → [72, 101, 108, 108, 111]

Characters become numbers computers can process.

2
Bytes → Tokens
[72, 101, 108, 108, 111] → [15496]

We group bytes into chunks so there's a shorter sequence to process.

3
Tokens → Vectors
[15496] → [0.12, -0.48, 0.91, ...]

Plain numbers don't capture meaning. This step gives each chunk a position that represents what it means.

4
Vectors → Transformer → Next Token

The Transformer reads context and predicts the next token.

5
Token → Text

Tokens convert back to text, looping until the response is complete.

Learning from Data

LLMs are built to predict the next word, but how do they get good at it? A freshly created model is just millions of random numbers. It knows nothing about language, grammar, or the world. If you asked it to complete "The cat sat on the", it might confidently answer "purple" or "seventeen".

The magic happens through training. We show the model enormous amounts of text and let it learn from its mistakes, over and over, until patterns emerge.

1
Input
"The king sat on the"

A training example from text data.

2
Prediction

The model guesses the next word. At first, it's random (e.g., "banana").

3
Comparison

The actual next word is "throne". We measure the error.

4
Update

The model adjusts its internal numbers to make "throne" more likely next time.

By repeating this process billions of times on massive datasets, the model's parameters gradually tune themselves. Words with similar meanings align in the vector space, and the model learns to recognize complex patterns, grammar, and facts.

What You Will Build

In this course, we will build every piece of this pipeline from scratch. By the end, you will understand not just what an LLM does, but how and why each component exists.

Along the way, you'll implement what you learn through coding challenges at the end of each chapter. Challenges marked Browser run right here. Challenges marked Local need dependencies like PyTorch, so you'll implement them in your editor and test locally with our CLI tool.

Let's start with the very first step: how computers see text.