Appendix F. Further Reading
This book covers a lot of ground, but the field moves fast. By the time you read this, new models, techniques, and tools will have emerged. This appendix organizes the best resources for continuing your education: foundational papers you should read, video courses that teach the material visually, books that go deeper on specific topics, blogs and newsletters that track the frontier, and open-source tools for hands-on experimentation.
Resources are grouped by category and roughly ordered from most accessible to most technical within each category. Every URL and fact has been verified as of March 20, 2026.
F.1 Foundational Papers
These are the papers that define the field. If you read nothing else, read these. Each one introduced a concept that every subsequent model builds on. They are listed in chronological order, with the chapter(s) where the concept is explained in this book.
| Paper | Authors | Date | Key Contribution | arXiv / Venue | Chapter |
|---|---|---|---|---|---|
| Attention Is All You Need | Vaswani et al. | June 2017 | Introduced the Transformer architecture: self-attention, multi-head attention, positional encoding, encoder-decoder structure. As of 2025, cited over 173,000 times. | arXiv:1706.03762, NeurIPS 2017 | 7, 10 |
| Improving Language Understanding by Generative Pre-Training (GPT-1) | Radford et al. | June 2018 | Showed that unsupervised pre-training on a large corpus followed by supervised fine-tuning produces strong results across diverse NLP tasks. 117M parameters. | OpenAI technical report | 1 |
| BERT: Pre-training of Deep Bidirectional Transformers | Devlin et al. | October 2018 | Introduced bidirectional pre-training with masked language modeling. Dominated NLP benchmarks for two years. | arXiv:1810.04805, NAACL 2019 | 1 |
| Language Models are Unsupervised Multitask Learners (GPT-2) | Radford et al. | February 2019 | Demonstrated that scaling next-token prediction to 1.5B parameters produces coherent long-form text without task-specific training. | OpenAI technical report | 1, 27 |
| Language Models are Few-Shot Learners (GPT-3) | Brown et al. | May 2020 | Scaled to 175B parameters and 300B training tokens. Showed that large models can perform tasks from a few examples in the prompt (in-context learning) without fine-tuning. | arXiv:2005.14165, NeurIPS 2020 | 1, 13 |
| LoRA: Low-Rank Adaptation of Large Language Models | Hu et al. | June 2021 | Introduced parameter-efficient fine-tuning by injecting trainable low-rank matrices into frozen model weights. Reduced trainable parameters by 10,000x while matching full fine-tuning quality. | arXiv:2106.09685, ICLR 2022 | 25, 28 |
| Training Compute-Optimal Large Language Models (Chinchilla) | Hoffmann et al. | March 2022 | Established that models and training data should scale together: a 70B model trained on 1.4T tokens outperforms a 280B model trained on 300B tokens. Overturned the “bigger model is always better” assumption. | arXiv:2203.15556, NeurIPS 2022 | 13 |
| FlashAttention: Fast and Memory-Efficient Exact Attention | Dao et al. | May 2022 | Made attention computation IO-aware, reducing memory from O(n^2) to O(n) and enabling much longer sequences. The foundation for all subsequent FlashAttention versions. | arXiv:2205.14135, NeurIPS 2022 | 20 |
| Training language models to follow instructions with human feedback (InstructGPT) | Ouyang et al. | March 2022 | Introduced RLHF (Reinforcement Learning from Human Feedback) for aligning language models with human preferences. A 1.3B InstructGPT model was preferred over the 175B GPT-3. | arXiv:2203.02155, NeurIPS 2022 | 15 |
| Constitutional AI: Harmlessness from AI Feedback | Bai et al. | December 2022 | Proposed using AI-generated feedback (instead of human labelers) to train models to be helpful and harmless, guided by a set of principles (a “constitution”). | arXiv:2212.08073 | 15, 26 |
| QLoRA: Efficient Finetuning of Quantized Language Models | Dettmers et al. | May 2023 | Combined 4-bit NormalFloat quantization with LoRA, enabling fine-tuning of 65B-parameter models on a single 48GB GPU. | arXiv:2305.14314, NeurIPS 2023 | 25, 28 |
| Direct Preference Optimization (DPO) | Rafailov et al. | May 2023 | Simplified RLHF by eliminating the reward model entirely, turning alignment into a classification problem on preference pairs. | arXiv:2305.18290, NeurIPS 2023 | 15 |
| Efficient Memory Management for Large Language Model Serving with PagedAttention (vLLM) | Kwon et al. | September 2023 | Applied virtual memory concepts to KV cache management, achieving 2-4x throughput improvement for LLM serving. | arXiv:2309.06180, SOSP 2023 | 18, 24 |
| Scaling Data-Constrained Language Models | Muennighoff et al. | May 2023 | Studied what happens when you run out of unique training data. Found that repeating data up to 4 epochs causes minimal degradation, but beyond that, returns diminish sharply. | arXiv:2305.16264, NeurIPS 2023 | 13 |
| DeepSeek-V3 Technical Report | DeepSeek-AI | December 2024 | Detailed the 671B-parameter MoE architecture (37B active) with Multi-head Latent Attention (MLA) and auxiliary-loss-free load balancing. Trained on 14.8T tokens for $5.576M. | arXiv:2412.19437 | 8, 11, 12, 18 |
| DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | DeepSeek-AI | January 2025 | Showed that reasoning behavior can emerge from pure reinforcement learning (GRPO) without supervised fine-tuning on chain-of-thought data. Open-sourced under MIT license. | arXiv:2501.12948 | 15, 16 |
| FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling | Zadouri et al. | March 2026 | Achieved 1,613 TFLOPs/s BF16 on NVIDIA B200 (71% utilization), 1.3x faster than cuDNN 9.13 and 2.7x faster than Triton. Implemented entirely in Python-embedded CuTe-DSL with 20-30x faster compile times. | arXiv:2603.05451 | 20 |
How to read these papers: Start with the abstract and introduction. Skip to the experiments section to see what the paper actually achieved. Then go back and read the method section with the relevant chapter of this book open for context. You do not need to understand every equation on first reading.
F.2 Video Courses and Lectures
Video is often the best way to build intuition for how these systems work. These are the highest-quality free resources available, listed from most beginner-friendly to most advanced.
Andrej Karpathy’s YouTube Channel
Karpathy is a former Director of AI at Tesla, a founding member of OpenAI, and founder of Eureka Labs. His teaching style is uniquely effective: he builds everything from scratch, in code, explaining every line.
“Deep Dive into LLMs like ChatGPT” (February 2025, ~3.5 hours). A comprehensive walkthrough of the entire LLM stack: pre-training data, tokenization, training, supervised fine-tuning, RLHF, and practical usage tips. The single best starting point for understanding how modern LLMs work. Covers the same ground as Chapters 1, 4, 14, 15, and 17 of this book.
- URL: youtube.com/@AndrejKarpathy
“How I Use LLMs” (February 27, 2025, ~2 hours). A practical guide to getting the most out of LLMs in daily work: which models to use, how to prompt them, when to use thinking models versus standard models. Complements Chapter 16 and Chapter 17.
- URL: youtube.com/@AndrejKarpathy
“Software Is Changing (Again)” (June 17, 2025, ~40 minutes). A keynote from Y Combinator’s AI Startup School in San Francisco. Karpathy lays out three eras of software: Software 1.0 (hand-written code), Software 2.0 (neural networks trained on data), and Software 3.0 (LLMs prompted in natural language). He argues that the fundamental nature of programming has changed twice in rapid succession, and that the current moment is the most exciting time to enter the software industry. Complements Chapters 23 and 29.
- URL: youtube.com/@AndrejKarpathy
“Neural Networks: Zero to Hero” playlist (2022-2023, ~25 hours total). Builds from basic backpropagation (micrograd) through character-level language models (makemore) to a full GPT implementation (nanoGPT). The “Let’s build GPT” video alone has nearly 7 million views. This playlist covers the same progression as Chapters 2, 3, 7, 9, 10, and 27 of this book.
- URL: karpathy.ai/zero-to-hero.html
- GitHub: github.com/karpathy/nn-zero-to-hero
“Let’s reproduce GPT-2 (124M)” (June 2024, ~4 hours). Builds and trains a GPT-2 reproduction from scratch, covering training optimization, hyperparameter selection from the GPT-2 and GPT-3 papers, and evaluation. Directly relevant to Chapter 14 and Chapter 27.
- URL: youtube.com/@AndrejKarpathy
microgpt.py (February 11, 2026). Not a video, but a 243-line pure Python GPT implementation with zero external dependencies. Training and inference in a single file. Useful as a reference after watching the videos above.
- GitHub: gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95
3Blue1Brown (Grant Sanderson)
Known for exceptional mathematical visualizations. His neural network and Transformer series makes the math behind attention, embeddings, and backpropagation visually intuitive in a way that text alone cannot.
“Transformers, the tech behind LLMs” series. Covers attention mechanisms, embeddings, and the full Transformer architecture with animated visualizations. Complements Chapters 5, 7, and 10.
- URL: 3blue1brown.com/lessons/gpt
“Neural Networks” series (4 parts). Covers what neural networks are, gradient descent, backpropagation, and how learning works. Complements Chapters 2 and 3.
- URL: 3blue1brown.com (Neural Networks playlist)
Stanford University Courses
Stanford offers several courses with publicly available lecture recordings and materials.
CS336: Language Modeling from Scratch (Spring 2025 and Spring 2026, taught by Percy Liang and Tatsunori Hashimoto). The most hands-on university course on LLMs. Students build an entire language model from scratch: data collection, tokenizer, Transformer implementation, training, and evaluation. The assignments require writing significantly more code than typical ML courses. Lecture recordings are available on YouTube. The course is being offered again in Spring 2026.
- URL: cs336.stanford.edu
CS224N: Natural Language Processing with Deep Learning (taught by Christopher Manning). The standard graduate NLP course. Covers word vectors, dependency parsing, Transformers, pre-training, and modern NLP applications. Lecture videos from multiple years are available on YouTube.
- URL: web.stanford.edu/class/cs224n/
CS229S: Systems for Machine Learning (taught by various faculty). Focuses on performance efficiency and scalability of deep learning systems, with emphasis on Transformer architectures and LLMs. Relevant to Chapters 14 and 24.
- URL: cs229s.stanford.edu
MIT 6.S191: Introduction to Deep Learning
Taught by Alexander Amini and Ava Amini. A fast-paced “boot camp” style course covering deep learning fundamentals, including neural networks, sequence models, generative models, and reinforcement learning. Lecture recordings are posted on YouTube each year. The 2025 edition includes a lecture on language models and new frontiers.
- URL: introtodeeplearning.com
DeepLearning.AI Short Courses
Founded by Andrew Ng, DeepLearning.AI offers free short courses (typically 1-2 hours each) built in partnership with companies like OpenAI, Anthropic, and Google. These are practical, code-first introductions to specific topics.
Notable courses relevant to this book:
“ChatGPT Prompt Engineering for Developers” (with Isa Fulford from OpenAI). Covers prompting best practices. Relevant to Chapter 17.
“LangChain for LLM Application Development” (with Harrison Chase). Covers building applications with LLMs. Relevant to Chapter 23.
“Open Source Models with Hugging Face”. Covers using open-weight models for text generation, summarization, and more. Relevant to Chapter 25.
URL: deeplearning.ai/short-courses/
Hugging Face Courses
Hugging Face offers two free, self-paced courses:
The LLM Course (formerly the NLP Course, expanded in 2025). Covers using the Transformers library, tokenizers, datasets, and the Hugging Face Hub. Includes hands-on Colab notebooks. Relevant to Chapters 4, 5, 25, and 28.
- URL: huggingface.co/learn/llm-course
The smol-course. A focused course on fine-tuning and aligning small language models. Relevant to Chapter 28.
- URL: github.com/huggingface/smol-course
F.3 Books
These books complement this one by going deeper on specific topics or approaching the material from a different angle.
| Book | Author(s) | Publisher | Year | Focus | Relevant Chapters |
|---|---|---|---|---|---|
| Build a Large Language Model (From Scratch) | Sebastian Raschka | Manning | 2024 | Step-by-step implementation of a GPT-like model in PyTorch, from tokenizer to fine-tuning. The closest companion to Chapter 27 of this book. | 3, 14, 27, 28 |
| Build a Reasoning Model (From Scratch) | Sebastian Raschka | Manning | 2026 (estimated summer) | Extends the above with reasoning capabilities: RLVR, GRPO, and test-time compute. Currently in MEAP (early access). Relevant to Chapters 15 and 16. | 15, 16 |
| Build an AI Agent (From Scratch) | Jungjun Hur, Younghee Song | Manning | 2026 (estimated summer) | Covers building AI agents from scratch: the ReAct loop, MCP tool integration, agentic RAG, memory modules, planning, reflection, and multi-agent coordination. Currently in MEAP (early access). Relevant to Chapters 23 and 29. | 23, 29 |
| Build a Multi-Agent System (from Scratch) | Val Andrei Fajardo | Manning | 2026 (estimated summer) | Covers building multi-agent systems from the ground up: the agent loop, tool orchestration via MCP, human-in-the-loop patterns, memory modules, and Agent2Agent (A2A) protocol compatibility. Currently in MEAP (early access). Relevant to Chapters 23 and 29. | 23, 29 |
| AI Engineering: Building Applications with Foundation Models | Chip Huyen | O’Reilly | 2024 | Covers the full stack of building AI applications: evaluation, RAG, agents, deployment, and monitoring. Focuses on the engineering side rather than the model internals. Relevant to Chapters 23, 24, and 29. | 23, 24, 29 |
| LLM Engineer’s Handbook | Paul Iusztin, Maxime Labonne | Packt | 2024 | End-to-end guide from model training to production deployment. Covers fine-tuning, serving, and building LLM applications. Relevant to Chapters 24, 25, and 28. | 24, 25, 28 |
Note on the Manning “From Scratch” series: As of March 2026, three Manning books in this series are in the Early Access Program (MEAP) with estimated publication dates of summer 2026. Sebastian Raschka’s Build a Reasoning Model (From Scratch) covers RLVR and GRPO in detail. Jungjun Hur and Younghee Song’s Build an AI Agent (From Scratch) covers the ReAct loop, MCP integration, and agentic RAG. Val Andrei Fajardo’s Build a Multi-Agent System (from Scratch) covers multi-agent orchestration, MCP, and the A2A protocol. Fajardo is a former founding engineer at LlamaIndex and a researcher at the Vector Institute for AI.
F.4 Blogs and Newsletters
The LLM field moves too fast for books alone. These blogs and newsletters are the best way to stay current. They are listed in order of accessibility, from most beginner-friendly to most research-focused.
For Staying Current
Simon Willison’s Weblog (simonwillison.net). The single best source for tracking what is happening in the LLM ecosystem, week by week. Willison covers new model releases, API changes, tool updates, and security implications with exceptional clarity and thoroughness. His annual “year in LLMs” reviews are essential reading. He also maintains an open-source LLM CLI tool (llm.datasette.io) for interacting with models from the command line.
Sebastian Raschka’s “Ahead of AI” (magazine.sebastianraschka.com). A Substack newsletter covering LLM research with a focus on practical implications. Raschka’s annual “State of LLMs” reviews provide excellent summaries of the year’s most important developments. His curated research paper lists are a good starting point for deeper reading.
For Technical Deep Dives
Jay Alammar’s Blog (jalammar.github.io). Famous for “The Illustrated Transformer,” which uses step-by-step visualizations to explain how attention, embeddings, and the full Transformer architecture work. Also covers GPT-2, GPT-3, BERT, and language model interpretability. The visualizations in these posts are among the most widely referenced in the field.
Lilian Weng’s “Lil’Log” (lilianweng.github.io). Comprehensive, deeply researched survey posts on topics like attention mechanisms, the Transformer family, hallucinations, reward hacking, and test-time compute. Each post is essentially a mini literature review with clear explanations. Her “Why We Think” (May 2025) on test-time compute, “Extrinsic Hallucinations in LLMs” (July 2024), and “Transformer Family Version 2.0” (January 2023) are particularly relevant to this book. Weng left OpenAI in November 2024 after seven years as VP of Research and Safety; the blog remains an invaluable archive.
Cameron R. Wolfe’s “Deep (Learning) Focus” (cameronrwolfe.substack.com). A Substack newsletter (over 60,000 subscribers) that picks a single topic per edition and covers it in depth, with clear explanations of the relevant papers. Wolfe is a Senior Research Scientist at Netflix. His posts on decoder-only Transformers, Mixture-of-Experts, and rubric-based rewards for RL are excellent companions to Chapters 10, 12, and 15 of this book.
For Research Tracking
Maxime Labonne’s LLM Course (github.com/mlabonne/llm-course). A curated GitHub repository (over 75,000 stars) organizing LLM resources into three tracks: fundamentals, the LLM scientist, and the LLM engineer. Includes roadmaps, Colab notebooks, and links to papers and tools. Updated regularly.
Hugging Face Papers (huggingface.co/papers). A daily feed of the most discussed ML papers, with community summaries and links to model implementations. Useful for tracking what the research community considers important.
F.5 Open-Source Tools for Hands-On Learning
Reading about LLMs is useful. Running them is better. These tools let you experiment with real models on your own hardware.
Running Models Locally
Ollama (ollama.com). The simplest way to run open-weight models locally. A single command (
ollama run qwen3:8b) downloads and runs a model. Supports GGUF quantized models, KV cache quantization (Q8_0, Q4_0 via theOLLAMA_KV_CACHE_TYPEenvironment variable), and GPU acceleration. Covered in Chapter 25.llama.cpp (github.com/ggml-org/llama.cpp). The C/C++ inference engine that powers Ollama and many other local inference tools. Supports CPU and GPU inference with GGUF quantized models. If you want to understand what happens under the hood when you run a model locally, start here. In February 2026, creator Georgi Gerganov and the founding ggml.ai team joined Hugging Face as full-time employees, bringing the local inference layer and the model distribution layer under one roof. The project remains fully open-source.
LM Studio (lmstudio.ai). A desktop application for running open-weight models with a chat interface. Provides a GUI for model selection, quantization options, and parameter tuning. Good for experimentation without command-line tools.
Autonomous Experimentation
- autoresearch (github.com/karpathy/autoresearch). Released by Karpathy on March 6, 2026 (announced on X on March 7). A 630-line Python script that implements a minimal autonomous research lab running on a single GPU: you give an AI agent training code, and it runs experiments overnight, modifying hyperparameters and architecture choices, keeping improvements and discarding regressions. Built on top of nanochat, a character-level language model. Hit 30,307 GitHub stars in its first week, making it one of the fastest-growing repositories in GitHub history. Demonstrates the “autoresearch loop” pattern of autonomous ML experimentation. Relevant to Chapters 27 and 29.
Serving Models in Production
vLLM (docs.vllm.ai). The most widely used open-source LLM serving engine. Implements PagedAttention for efficient KV cache management, achieving 2-4x throughput over naive serving. Supports tensor parallelism, prefix caching, LoRA adapter serving, and FP8 KV cache on Hopper/Ada GPUs. Covered in Chapters 18, 24, and 28.
SGLang (github.com/sgl-project/sglang). A serving framework built around RadixAttention for automatic KV cache reuse, achieving up to 6.4x throughput improvement. Includes a frontend language for programming LLM interactions. Covered in Chapters 19 and 24.
TensorRT-LLM (github.com/NVIDIA/TensorRT-LLM). NVIDIA’s optimized inference library for their GPUs. Achieves the highest raw throughput on NVIDIA hardware (up to 10,000 tokens/second on H100 with FP8). Covered in Chapter 24.
Fine-Tuning
Unsloth (github.com/unslothai/unsloth). Makes fine-tuning 2x faster with 70% less VRAM. Supports LoRA, QLoRA, and GRPO on consumer GPUs (RTX 40/50 series). Compatible with Qwen3, Qwen 3.5, LLaMA 4, DeepSeek, and GPT-OSS models. Covered in Chapters 25 and 28.
TRL (Transformer Reinforcement Learning) (github.com/huggingface/trl). Hugging Face’s library for training language models with reinforcement learning. Supports SFT, DPO, PPO, and GRPO trainers. The standard tool for alignment training. Covered in Chapters 15, 17, 25, and 28.
PEFT (Parameter-Efficient Fine-Tuning) (github.com/huggingface/peft). Hugging Face’s library implementing LoRA, QLoRA, DoRA, and other parameter-efficient methods. Integrates with the Transformers library. Covered in Chapters 25 and 28.
Building Agents
OpenAI Agents SDK (github.com/openai/openai-agents-python). Released March 11, 2025; version 0.12.5 as of March 19, 2026. Despite the name, it is provider-agnostic and supports 100+ LLMs. Includes tool calling, handoffs, guardrails, human-in-the-loop, sessions, and realtime/voice agents. Covered in Chapters 23 and 29.
Claude Agent SDK (docs.claude.com/en/docs/claude-code/sdk). Originally released as the Claude Code SDK in February 2025, rebranded to Claude Agent SDK on September 29, 2025 to reflect its broader capabilities beyond coding. The SDK exposes the same agent harness that powers Claude Code as a library for building custom agents. Supports subagents, sessions, tool use, and human-in-the-loop workflows. Covered in Chapters 23 and 29.
MCP Python SDK (github.com/modelcontextprotocol/python-sdk). The official SDK for building Model Context Protocol servers and clients. Version 1.26.0 as of January 24, 2026. Supports Streamable HTTP transport (replacing SSE since March 2025). Covered in Chapters 23 and 29.
Google ADK (Agent Development Kit) (github.com/google/adk-python). Released April 9, 2025 at Google Cloud NEXT. A framework for building multi-agent systems with tool use. Covered in Chapters 23 and 29.
F.6 Keeping Up After This Book
The resources above will serve you well, but the field will continue to evolve. Here is a practical strategy for staying current:
- Weekly: Read Simon Willison’s blog and check Hugging Face Papers for new releases.
- Monthly: Read Sebastian Raschka’s “Ahead of AI” newsletter and Cameron Wolfe’s “Deep (Learning) Focus” for research summaries.
- Quarterly: Check the Appendix E model comparison table against current model releases. New models will have been released since this book’s publication.
- When a major model drops: Read the technical report (usually on arXiv), then check the blogs above for accessible explanations.
- When you want to go deeper on a topic: Start with the relevant chapter of this book, then follow the paper citations in that chapter’s sources section.
The most important habit is not reading more, but building more. Pick a model, pick a task, and build something. The tools in Section F.5 make this easier than it has ever been.
F.7 Key Takeaways
The 17 foundational papers in Section F.1 define the core concepts behind every model discussed in this book. Start with “Attention Is All You Need” (2017) and work forward chronologically.
Andrej Karpathy’s videos are the single best free resource for building intuition about how LLMs work. His “Deep Dive into LLMs like ChatGPT” (February 2025, ~3.5 hours) covers the full training pipeline in one sitting. His “Software Is Changing (Again)” keynote (June 2025) provides essential context on how LLMs are reshaping software development.
Stanford CS336 (Language Modeling from Scratch) is the most rigorous hands-on university course. Students build an entire language model from data collection through evaluation. The course is being offered again in Spring 2026.
Sebastian Raschka’s books are the closest companions to this one: Build a Large Language Model (From Scratch) (2024) for implementation and Build a Reasoning Model (From Scratch) (estimated summer 2026) for reasoning and RL. For agents and tool use, see Jungjun Hur and Younghee Song’s Build an AI Agent (From Scratch) and Val Andrei Fajardo’s Build a Multi-Agent System (from Scratch), both estimated summer 2026.
Chip Huyen’s AI Engineering (2024) covers the application and deployment side that this book touches on in Chapters 23, 24, and 29 but does not cover exhaustively.
For staying current, Simon Willison’s blog and Sebastian Raschka’s newsletter are the two most consistently valuable sources in the field.
For hands-on experimentation, Ollama (for running models locally), Unsloth (for fine-tuning), the OpenAI Agents SDK and Claude Agent SDK (for building agents), and Karpathy’s autoresearch (for autonomous ML experimentation) are the fastest paths from reading to building.
The field moves fast, but the fundamentals are stable. The Transformer architecture (2017), attention mechanism, tokenization, embeddings, and the pre-train/fine-tune/align pipeline have remained the core framework for nearly a decade. Understanding these deeply, as this book aims to provide, gives you the foundation to evaluate any new development.
Appendix E provides a detailed comparison table of every frontier model referenced in this book, with specifications, pricing, and context windows as of March 2026.
Sources: All URLs, publication dates, and facts in this appendix are verified via web search as of March 20, 2026. Key sources include: “Attention Is All You Need” citation count from Wikipedia (en.wikipedia.org/wiki/Attention_Is_All_You_Need, citing over 173,000 as of 2025). Karpathy’s “Neural Networks: Zero to Hero” playlist at karpathy.ai/zero-to-hero.html (confirmed from karpathy.ai). Karpathy’s “Let’s build GPT” video views (~7 million, confirmed from summify.io). Karpathy’s “Deep Dive into LLMs like ChatGPT” duration 3 hours 31 minutes (confirmed from anfalmushtaq.com, medium.com). Karpathy’s microgpt.py released February 11, 2026, 243 lines (confirmed from blockchain.news, analyticsvidhya.com, generativeai.pub, github.com/karpathy gist; Karpathy’s own blog at karpathy.github.io/2026/02/12/microgpt describes it as “200 lines” counting core algorithm only). Karpathy’s “Software Is Changing (Again)” keynote at YC AI Startup School, San Francisco, June 17, 2025, ~40 minutes (confirmed from medium.com/@kansm, gigazine.net, latent.space, summify.io showing “40 min video” and 2.3M+ views). Karpathy’s autoresearch released March 6, 2026 (repo creation), announced on X March 7, 2026, 630 lines, 30,307 stars in first week (confirmed from simplenews.ai, rywalker.com, forbes.com, blockchain.news, launchberg.com, theneuron.ai, jangwook.net). Karpathy biography: founding member of OpenAI, former Director of AI at Tesla, founder of Eureka Labs (confirmed from en.wikipedia.org/wiki/Andrej_Karpathy: “Karpathy is a founding member of the artificial intelligence research group OpenAI”). Karpathy’s “How I Use LLMs” duration ~2 hours 11 minutes (confirmed from blockchain.news). FlashAttention-4 authors: Ted Zadouri, Markus Hoehnerbach, Jay Shah, Timmy Liu, Vijay Thakkar, Tri Dao; submitted March 5, 2026; 1,613 TFLOPs/s BF16 on B200 per arXiv abstract (confirmed from arxiv.org/abs/2603.05451; the companion blog at tridao.me/blog/2026/flash4 reports “up to 1605 TFLOPs/s” which may reflect a different benchmark configuration; the arXiv paper is the primary source; also confirmed from blog.ai.princeton.edu). 3Blue1Brown Transformer series at 3blue1brown.com/lessons/gpt (confirmed from 3blue1brown.com). Stanford CS336 taught by Percy Liang and Tatsunori Hashimoto, offered Spring 2025 and Spring 2026 (confirmed from cs336.stanford.edu showing cs336-spr2526-staff email, online.stanford.edu, classcentral.com). Stanford CS224N taught by Christopher Manning (confirmed from web.stanford.edu/class/cs224n). MIT 6.S191 taught by Alexander Amini and Ava Amini, 2025 edition includes language models lecture (confirmed from introtodeeplearning.com, classcentral.com, mitadmissions.org). DeepLearning.AI short courses including “ChatGPT Prompt Engineering for Developers” with Isa Fulford and Andrew Ng (confirmed from deeplearning.ai/short-courses). Hugging Face LLM Course expanded from NLP Course in 2025 (confirmed from huggingface.co/blog/llm-course, huggingface.co/learn/llm-course). Hugging Face smol-course (confirmed from github.com/huggingface/smol-course). Sebastian Raschka, Build a Large Language Model (From Scratch), Manning, 2024 (confirmed from manning.com, amazon.com, goodreads.com). Sebastian Raschka, Build a Reasoning Model (From Scratch), Manning, estimated summer 2026 MEAP (confirmed from manning.com/books/build-a-reasoning-model-from-scratch, devtalk.com, forthcomingbooks.com). Jungjun Hur and Younghee Song, Build an AI Agent (From Scratch), Manning, estimated summer 2026 MEAP, ISBN 9781633434615 (confirmed from manning.com/books/build-an-ai-agent-from-scratch; authors confirmed from manning.com: Jungjun Hur is an AI and data engineer, author of Practical AI Application Development Using LLMs; Younghee Song is an AI consultant at PwC). Val Andrei Fajardo, Build a Multi-Agent System (from Scratch), Manning, estimated summer 2026 MEAP, ISBN 9781633434660 (confirmed from manning.com/books/build-a-multi-agent-system-from-scratch; Fajardo is a former founding engineer at LlamaIndex and researcher at the Vector Institute for AI). Chip Huyen, AI Engineering: Building Applications with Foundation Models, O’Reilly, 2024 (confirmed from oreilly.com/library/view/ai-engineering/9781098166298, huyenchip.com/books). Paul Iusztin and Maxime Labonne, LLM Engineer’s Handbook, Packt, 2024, 4.8-4.9 rating (confirmed from packtpub.com). Simon Willison’s Weblog at simonwillison.net (confirmed from simonwillison.net). Simon Willison’s LLM CLI tool at llm.datasette.io, version 0.27+ with tool calling support (confirmed from simonw.substack.com, simonwillison.net). Sebastian Raschka’s “Ahead of AI” newsletter at magazine.sebastianraschka.com (confirmed from magazine.sebastianraschka.com/about). Jay Alammar’s blog at jalammar.github.io, “The Illustrated Transformer” (confirmed from jalammar.github.io, techplanet.today). Lilian Weng’s “Lil’Log” at lilianweng.github.io; latest post “Why We Think” May 1, 2025 on test-time compute; “Reward Hacking in Reinforcement Learning” November 28, 2024; “Extrinsic Hallucinations in LLMs” July 7, 2024 (confirmed from lilianweng.github.io). Lilian Weng left OpenAI in November 2024 after seven years as VP of Research and Safety, last day November 15 (confirmed from techcrunch.com, cryptonewsland.com, gigazine.net, newsbytesapp.com). Cameron R. Wolfe’s “Deep (Learning) Focus” at cameronrwolfe.substack.com, over 60,000 subscribers, Senior Research Scientist at Netflix (confirmed from cameronrwolfe.substack.com/about, substack.com guest lecture post citing 60K subscribers). Maxime Labonne’s LLM Course at github.com/mlabonne/llm-course, over 75,000 stars (confirmed from thenextgentechinsider.com citing 75K stars, github.com/mlabonne). Claude Agent SDK rebranded from Claude Code SDK on September 29, 2025 (confirmed from anthropic.com/engineering/building-agents-with-the-claude-agent-sdk, docs.claude.com/en/docs/claude-code/sdk/migration-guide, digitalapplied.com, myaiexp.com). OpenAI Agents SDK version 0.12.5 released March 19, 2026 (confirmed from pypi.org/project/openai-agents). MCP Python SDK version 1.26.0 released January 24, 2026 (confirmed from pypi.org/project/mcp). Google ADK released April 9, 2025 at Google Cloud NEXT (confirmed from developers.googleblog.com, c-sharpcorner.com). ggml.ai team (Georgi Gerganov and founding team) joined Hugging Face on February 20, 2026; llama.cpp remains fully open-source (confirmed from jangwook.net, enclaveai.app, blog.ngxson.com, techplanet.today). vLLM paper (PagedAttention) date corrected from June 2023 to September 2023: arXiv:2309.06180 submitted September 12, 2023, presented at SOSP 2023 (confirmed from arxiv.org/abs/2309.06180; the vLLM system was open-sourced in June 2023, but the paper was submitted in September). All arXiv paper IDs verified against arxiv.org. All foundational paper dates and venues cross-referenced with prior chapter source citations.