A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.
A sales data loader that reads a CSV, calculates total and average revenue by region, filters for top performers, and prints a clean summary — all in about 15 lines of Python. We'll compare this to doing the same thing in Excel.
Excel is great. It's visual, fast to start, and everyone knows it. But it has three problems that Python solves:
Problem 1: Scale. Excel slows to a crawl above 100K rows. Python handles millions of rows in seconds.
Problem 2: Reproducibility. When your boss asks "how did you get this number?" in Excel, you trace 14 formulas. In Python, you show them the code.
Problem 3: Automation. Excel requires a human every time. Python can run the same analysis on every new dataset without you.
This course is not about replacing your Excel skills. It's about adding a new layer on top of them. Your domain knowledge of what the data means is irreplaceable — Python just removes the grunt work.
Go to python.org/downloads and download Python 3.11 or later. Run the installer. On Windows, check the box that says "Add Python to PATH" before clicking Install.
Verify it worked by opening your terminal (Command Prompt on Windows, Terminal on Mac) and typing:
python --version
You should see something like Python 3.11.4. If it says "command not found," restart your terminal and try again.
In your terminal, run:
pip install pandas
This installs pandas — the Python library that works like Excel but faster and scriptable. You only do this once.
Download VS Code from code.visualstudio.com — free. Install the Python extension when prompted. This is where you'll write your scripts for the rest of the course.
We'll create a simple sales CSV to work with. Create a new folder called data-analysis-course, open it in VS Code, and create a file called sales.csv with this content:
region,rep,product,revenue,units West,Alice,Software,42000,14 East,Bob,Hardware,38000,9 West,Carol,Software,51000,17 North,Dave,Services,29000,22 East,Eve,Software,67000,21 South,Frank,Hardware,33000,11 North,Grace,Services,44000,31 South,Heidi,Software,58000,19 West,Ivan,Hardware,27000,8 East,Judy,Services,72000,48
Create a new file in VS Code called analyze.py. Here's the full script — we'll walk through each part:
import pandas as pd # Load the data df = pd.read_csv("sales.csv") # See the shape of the data print(f"Loaded {len(df)} rows and {len(df.columns)} columns") print(df.head()) # First 5 rows # Total revenue by region by_region = df.groupby("region")["revenue"].sum().sort_values(ascending=False) print("\n--- Revenue by Region ---") print(by_region) # Top performers (revenue > $50K) top_reps = df[df["revenue"] > 50000][["rep", "region", "revenue"]] print("\n--- Top Performers ---") print(top_reps.to_string(index=False)) # Summary stats print(f"\nTotal revenue: ${df['revenue'].sum():,}") print(f"Average deal: ${df['revenue'].mean():,.0f}") print(f"Best region: {by_region.index[0]}")
Run it from your terminal:
python analyze.py
What just happened: You loaded a dataset, grouped it by region, filtered for top performers, and calculated summary stats — in about 15 lines. The equivalent Excel workflow involves pivot tables, filters, and manual number-checking. The Python version runs in under a second and works on any size dataset.
import pandas as pd — Loads the pandas library. The pd is just a shorthand name everyone uses.
pd.read_csv("file.csv") — Reads any CSV file into a DataFrame. A DataFrame is like an Excel sheet you can program.
df.groupby("column")["other_column"].sum() — Group rows by a category and aggregate. Same as a pivot table, one line.
df[df["column"] > value] — Filter rows by a condition. Same as Excel's filter, but scriptable.
df["column"].sum() — Sum a column. Works on millions of rows without crashing.
The foundations from today carry directly into Day 2. In the next session the focus shifts to Day 2 — building directly on everything covered here.
Before moving on, verify you can answer these without looking:
Live Bootcamp
Learn this in person — 2 days, 5 cities
Thu–Fri sessions in Denver, Los Angeles, New York, Chicago, and Dallas. $1,490 per seat. June–October 2026.
Reserve Your Seat →