The intermediate techniques that separate junior analysts from senior ones. Write powerful, readable queries that would impress any data team.
An advanced sales analysis — rank customers by spend, show each order's percentage of total revenue, and compute month-over-month growth. Queries that would make a data engineer nod in approval.
A subquery is a SELECT statement nested inside another query. It lets you use the result of one query as input to another.
-- Orders above the average order value SELECT * FROM orders WHERE amount > ( SELECT AVG(amount) FROM orders ); -- Customers who have placed at least one order SELECT * FROM customers WHERE id IN ( SELECT DISTINCT customer_id FROM orders ); -- Each order and its % of total revenue SELECT id, amount, ROUND( amount * 100.0 / (SELECT SUM(amount) FROM orders), 2 ) AS pct_of_total FROM orders ORDER BY pct_of_total DESC;
A CTE (Common Table Expression) is a named subquery defined at the top of your query with WITH. It makes complex queries readable by breaking them into named steps.
-- Step 1: summarize revenue per customer WITH customer_revenue AS ( SELECT c.name, c.segment, SUM(o.amount) AS total, COUNT(*) AS orders FROM customers c JOIN orders o ON c.id = o.customer_id WHERE o.status = 'completed' GROUP BY c.name, c.segment ), -- Step 2: find the total across all customers grand_total AS ( SELECT SUM(total) AS grand FROM customer_revenue ) -- Step 3: combine and show share SELECT cr.name, cr.segment, cr.total, ROUND(cr.total * 100.0 / gt.grand, 1) AS revenue_share_pct FROM customer_revenue cr, grand_total gt ORDER BY cr.total DESC;
CTEs are just for readability. The database executes them the same as a nested subquery. But readable SQL is maintainable SQL — especially when your queries grow to 100+ lines.
Window functions let you compute aggregations without collapsing rows. You get the aggregate value on every row, alongside the original data. This is what makes ranking, running totals, and comparisons possible.
-- Rank each order by amount (1 = highest) SELECT id, customer_id, amount, RANK() OVER (ORDER BY amount DESC) AS rank FROM orders; -- Running total of revenue over time SELECT order_date, amount, SUM(amount) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS running_total FROM orders WHERE status = 'completed' ORDER BY order_date; -- Row number per customer (their 1st order, 2nd order, etc.) SELECT customer_id, order_date, amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date ) AS order_number FROM orders;
PARTITION BY is like GROUP BY inside the window — it resets the calculation for each group. Without it, the window function runs across the entire result set.
-- Each order vs the previous order (growth / decline) SELECT order_date, amount, LAG(amount) OVER (ORDER BY order_date) AS prev_amount, amount - LAG(amount) OVER (ORDER BY order_date) AS change FROM orders WHERE status = 'completed' ORDER BY order_date;
Day 4 shows you how to use Claude and ChatGPT to generate, explain, and optimize SQL — the way modern data teams actually work.
Day 4: AI-Assisted SQLBefore moving on, make sure you can answer these without looking: