5-Day Free Course · Computer Architecture

How CPUs Work: From Transistors to Execution

Every program you write runs on transistors. This course traces the path from silicon switches to executing instructions — logic gates, the ALU, registers, memory hierarchy, pipelining, and the CPU optimizations that make modern code fast or slow.

Start Day 1 → See Syllabus

5 days self-paced

Free forever

Text + external video refs

No signup required

Days

25+

Code Examples

External Videos

Forever Free

How This Course Works

No videos. On purpose.

This is a text-first course that links out to the best supporting material on the internet instead of trying to replace it. The goal is to make this the best course on computer architecture and systems programming you can find — even without producing a single minute of custom video.

Practitioner-tested, not vendor marketing

This course is built by people who ship production computer systems for a living. It reflects how things actually work on real projects — not how the documentation describes them.

Code you can run, not demos you can watch

Every day has working code snippets you can paste into your editor and run right now. The emphasis is on understanding what each line does, not memorizing syntax.

Links to the canonical sources

Instead of shooting videos that go stale in six months, Precision AI Academy links to the definitive open-source implementations, official documentation, and the best conference talks on the topic.

Completes in 5 one-hour sessions

Each day is designed to finish in about an hour of focused reading plus hands-on work. You can do the whole course over a week of lunch breaks. No calendar commitment, no live classes, no quizzes.

Syllabus

The 5 Days

Each day stands alone. Read them in order for the full picture, or jump straight to the day that answers the question you have today.

01Day One

Transistors and Logic Gates

How MOSFET transistors implement NOT, AND, OR, and NAND gates. How gates combine into adders, multiplexers, and latches. The building blocks of every digital circuit.

transistorslogic gatesNANDdigital circuits

→

02Day Two

Arithmetic Logic Unit

How the ALU performs addition, subtraction, bitwise operations, and comparison. The carry-lookahead adder and why hardware addition isn’t as simple as it looks.

ALUaddercarrybitwise operations

→

03Day Three

Registers and Memory

CPU registers, SRAM caches (L1/L2/L3), DRAM main memory, and virtual memory. The memory hierarchy: speed vs size vs cost tradeoffs that dominate CPU performance.

registerscacheDRAMmemory hierarchy

→

04Day Four

The Instruction Cycle

Fetch, decode, execute, write-back. How the program counter advances, how opcodes map to micro-operations, and what actually happens when you call a function.

fetch-decode-executeopcodesprogram counterfunction calls

→

05Day Five

Modern CPU Features

Pipelining, branch prediction, out-of-order execution, and SIMD instructions. Why these optimizations exist, how to write code that benefits from them, and how they create security vulnerabilities like Spectre.

pipeliningbranch predictionout-of-orderSIMD

→

Supporting Videos

The best external videos on this topic.

Instead of shooting our own videos, Precision AI Academy links to the best deep-dives already on YouTube. Watch them alongside the course. All external, all free, all from builders who ship this stuff.

YouTube · Search

How Computers Work — Crash Course

The canonical visual series explaining logic gates, memory, and the CPU from first principles.

YouTube · Search

CPU Architecture Explained

Instruction pipelining, branch prediction, out-of-order execution, and cache hierarchies in modern x86 and ARM CPUs.

YouTube · Search

Memory Hierarchy and Caching

L1/L2/L3 caches, cache miss penalties, cache-friendly data structures, and why memory access patterns dominate performance.

YouTube · Search

SIMD and Vectorization

How SIMD instructions process multiple data elements simultaneously and how compilers auto-vectorize loops.

Open-Source References

Read the source. Every line.

The best way to understand any technology is to read the production-grade implementations that prove it works. These repositories implement patterns from every day of this course.

github.com/nicowillis/nand2tetris

Nand2Tetris Projects

Companion projects for the From NAND to Tetris course. Build a CPU from logic gates in a hardware simulator. The practical complement to Days 1-3.

github.com/llvm/llvm-project

LLVM

The compiler infrastructure that turns high-level languages into CPU instructions. Reading the backend code generators shows how compilers target specific CPU features.

github.com/google/cpu_features

cpu_features

Google’s library for detecting CPU features at runtime. Shows which SIMD extensions (SSE4, AVX2, NEON) are available on the current CPU.

github.com/uops-info/uops-info.github.io

uops.info

Instruction latency and throughput data for x86 CPUs. The reference for Day 5 performance optimization.

Who This Is For

Three kinds of people read this.

Developers Wondering Why Their Code Is Slow

Cache misses, branch mispredictions, and SIMD under-utilization are invisible until you understand the hardware. This course makes them visible.

Systems Programmers and Embedded Engineers

You write code close to the metal. This course fills in the architecture knowledge that makes hardware behavior predictable.

CS Students in Architecture Courses

The lecture slides are abstract. This course pairs the theory with clear explanations and external resources that make the concepts concrete.

Want to Go Deeper in Person?

The 2-day in-person Precision AI Academy bootcamp covers computer architecture and systems programming hands-on. 5 U.S. cities. $1,490. 40 seats max. June–October 2026 (Thu–Fri).

Reserve Your Seat