Back to Roadmap00

Introduction

The big picture: how Large Language Models actually work.

You send a prompt to ChatGPT and seconds later, a response streams back word by word. What's actually happening inside? This chapter walks through the process end to end, from the moment text enters the model to the moment a prediction comes out. The rest of the course builds each piece of this pipeline from the ground up.

The Core Idea

At its core, a Large Language Model does one thing: predict the next word. When you give it "The cat sat on the", the model looks at those words and outputs a probability for every possible next word in its vocabulary.

Iteration 1
Thecatsatonthe
mat (15%)floor (12%)roof (3%)...

The model samples one word from the high-probability candidates, appends it to the input, and repeats.

Iteration 2
Thecatsatonthemat
and (18%). (14%)while (8%)...

Each new word feeds back in as input, and the process repeats until the response is complete. Every response you see from a language model is built one word at a time through this loop.

The Full Pipeline

When you send a prompt to an LLM and receive a response, your text goes through a series of transformations.

Don't worry about understanding exactly how or why these steps happen right now. We will explore each one in detail later. For now, just focus on the high-level stages your text passes through.

1
Text → Bytes
"Hello" → [72, 101, 108, 108, 111]

Characters become numbers computers can process.

2
Bytes → Tokens
[72, 101, 108, 108, 111] → [15496]

Bytes are grouped into meaningful chunks, giving the model a shorter sequence to work with.

3
Tokens → Vectors
[15496] → [0.12, -0.48, 0.91, ...]

Each token ID becomes a vector, a list of numbers that captures the token's meaning.

4
Vectors → Transformer → Next Token

The Transformer processes the full sequence and predicts what comes next.

5
Token → Text

The predicted token converts back to text, and the loop continues until the response is complete.

Learning from Data

LLMs are built to predict the next word, but how do they get good at it? A freshly created model is just millions of random numbers. It knows nothing about language, grammar, or the world. If you asked it to complete "The cat sat on the", it might confidently answer "purple" or "seventeen".

These random numbers become useful through training. The model processes billions of text examples, and after each prediction, its parameters are adjusted to reduce the error between what it guessed and what actually came next.

1
Input
"The king sat on the"

A training example from text data.

2
Prediction

The model guesses the next word. At first, it's random (e.g., "banana").

3
Comparison

The actual next word is "throne". We measure the error.

4
Update

The model adjusts its internal numbers to make "throne" more likely next time.

By repeating this process billions of times across massive datasets, the model's parameters gradually shift from random noise into structured representations. It learns to recognize grammar, word relationships, and factual associations, all from predicting the next word.

What You Will Build

In this course, we will build every piece of this pipeline from scratch. By the end, you will understand not just what an LLM does, but how and why each component exists.

Throughout the course, you'll implement what you learn through coding challenges at the end of each chapter. Most challenges run directly in the built‑in editor and give feedback immediately. For local project workflows, check the setup page for the current CLI status and rollout plan.

Let's start with the very first step: how computers see text.