From Gradients to ChatGPT¶

Cover illustration: a circus-themed map of the course. Numbered booths under a big top represent each of the twenty modules — pretraining, tokenization, embeddings, gradient descent, self-attention, multi-head attention, the transformer, sampling, SFT, DPO, RAG, tools, the agent, the eval inspector, and an "inference booth" robot at the bottom.

A 20-week self-study course building a tiny LLM stack from scalar autodiff up through a working chat assistant. Modeled after From NAND to Tetris: the codebase grows layer-by-layer, and every block of the stack is something you write yourself.

The hard constraint: everything runs on an M-series MacBook — no cloud GPUs, no paid compute.

Why this course¶

Millions interact with LLMs on a daily basis. But few bother to understand how these systems actually work. How does an LLM understand language? What is a "model" and how does it learn? How does a chat assistant "know" how to answer a question? How does matrix multiplication produce intelligent behavior? In this course we answer these questions from first principles.

Our goal is to go below the API. To understand every part of the LLM stack, end to end. You start with basic autodiff, and over twenty modules grow it: tensors, a neural net, a tokenizer, embeddings, attention, the transformer, pretraining, sampling, SFT, DPO, evaluation, RAG, tool use, an agent loop. By the final module you have a chat assistant you built end to end — tools, retrieval, and a swappable backend: the tiny model you trained yourself, or a stronger local open model when you want it to be genuinely useful. All on your laptop. All built by you.

The scaffolding is real but small: a tokenizer that takes ten minutes to train, a transformer with a few million parameters, a corpus that fits in RAM. Tiny is deliberate. Once you've built every layer once at toy scale, the production-scale versions stop being magic.

What's in it¶

Twenty modules plus a fast prerequisite review, organized in five phases. Each module is roughly one week of effort at the level of a rigorous elite-college course.

#	Module	Phase
00	Prerequisite review	0 — Review
01	Scalar autodiff	I — Foundations
02	Tensors and matmul	I — Foundations
03	A first neural network	I — Foundations
03B	Training	I — Foundations
04	Tokenization	II — Language
05	Embeddings and positions	II — Language
06	Next-token prediction	II — Language
07	Self-attention	III — The transformer
08	Multi-head attention	III — The transformer
09	The transformer block	III — The transformer
09B	Pretraining	III — The transformer
10	Milestone: TinyLLM	III — The transformer
11	Sampling and decoding	IV — Behavior shaping
12	Scaling experiments	IV — Behavior shaping
13	Instruction tuning (SFT)	IV — Behavior shaping
14	Preference tuning (DPO)	IV — Behavior shaping
15	Hallucination and evaluation	IV — Behavior shaping
16	Local pretrained models and inference	V — Assistant systems
17	Retrieval-augmented generation	V — Assistant systems
18	Tool use	V — Assistant systems
19	Agent loops	V — Assistant systems
20	Capstone: a tiny ChatGPT	V — Assistant systems

The syllabus lays out each phase in detail and gives the full motivation for the ordering.

Who it's for¶

You'll get the most out of this if you're comfortable with Python, undergraduate calculus (chain rule, gradients), and basic linear algebra. You don't need prior deep learning experience — Module 0 covers the prerequisites and Modules 1–3 build the math substrate from scratch. You should be willing to read a paper now and then, and be willing to debug your own code without a framework hiding the failure mode.

If you've watched Karpathy's videos and wished for a structured curriculum with exercises, deliverables, and tests, this is that.

How the course works¶

Each module in the course has a lesson page and a set of deliverables combining a coding project and a problem set. The lesson pages are hosted on this site and also available as markdown in the course repo.

Each week, students will read the module lesson page. They'll then implement a new sub-package covering that week's topic inside the g2c/ Python package. Finally they'll complete a set of student exercises in a Jupyter notebook using the code they wrote that week. The lesson pages are the readable front door; the repo is where the code lives.

Get started¶

Read the syllabus for the full 20-week arc.
Clone the repository and follow the README quickstart to set up on your machine.
Start with Module 0: Prerequisite review, or jump straight into Module 1: Scalar autodiff.