Combine tables, count rows, sum revenue, and compute averages. This is what data analysis actually looks like — and SQL makes it effortless.
A customer revenue report — total spend per customer, number of orders, average order size, and which city generates the most revenue. All from the database you built yesterday.
Aggregate functions collapse many rows into a single value. These five cover 90% of everything you'll do with numbers in SQL.
-- How many orders total? SELECT COUNT(*) AS total_orders FROM orders; -- Total revenue from completed orders SELECT SUM(amount) AS total_revenue FROM orders WHERE status = 'completed'; -- Average order value SELECT AVG(amount) AS avg_order FROM orders; -- Largest and smallest single orders SELECT MAX(amount) AS biggest_order, MIN(amount) AS smallest_order FROM orders;
COUNT(*) vs COUNT(column): COUNT(*) counts all rows. COUNT(column) counts only rows where that column is NOT NULL. Use COUNT(*) unless you're specifically checking for null values.
GROUP BY splits your table into groups and runs the aggregate function on each group. This is the core of almost every analytics query.
-- Total orders per customer_id SELECT customer_id, COUNT(*) AS num_orders, SUM(amount) AS total_spent, AVG(amount) AS avg_order FROM orders GROUP BY customer_id ORDER BY total_spent DESC; -- Revenue breakdown by order status SELECT status, COUNT(*) AS count, SUM(amount) AS total FROM orders GROUP BY status;
The rule: any column in your SELECT that is NOT inside an aggregate function must appear in GROUP BY. Break this rule and you'll get an error.
WHERE filters rows before grouping. HAVING filters groups after aggregation. You need both when you want "customers who spent more than $10,000."
-- Customers with more than $10,000 in completed orders SELECT customer_id, SUM(amount) AS total_spent FROM orders WHERE status = 'completed' -- filter rows first GROUP BY customer_id HAVING SUM(amount) > 10000 -- then filter groups ORDER BY total_spent DESC;
A JOIN links two tables on a shared column (usually an ID). Your orders table has a customer_id. The customers table has an id. A JOIN connects them so you can see customer names next to their orders.
Returns only rows that have a match in both tables. Most common type.
-- Orders with customer names (not just IDs) SELECT c.name, c.city, o.order_date, o.amount, o.status FROM orders o INNER JOIN customers c ON o.customer_id = c.id ORDER BY o.amount DESC; -- Revenue per customer name (combining JOIN + GROUP BY) SELECT c.name, c.city, COUNT(*) AS orders, SUM(o.amount) AS total_revenue FROM orders o INNER JOIN customers c ON o.customer_id = c.id WHERE o.status = 'completed' GROUP BY c.name, c.city ORDER BY total_revenue DESC;
Table aliases (o and c) are just shorthand. Writing orders o means "call the orders table 'o' in this query." It saves typing and makes complex queries readable.
-- All customers, even those with no orders (NULL for order fields) SELECT c.name, COUNT(o.id) AS order_count, COALESCE(SUM(o.amount), 0) AS total_spent FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.name ORDER BY total_spent DESC;
COALESCE(value, 0) replaces NULL with 0. Useful whenever a LEFT JOIN might produce NULLs in your aggregate columns.
Day 3 is the jump from good to great — subqueries, CTEs, and window functions that most analysts never learn.
Day 3: Subqueries, CTEs & Window FunctionsBefore moving on, make sure you can answer these without looking: