Day 01 Foundations

ETL: Extract, Transform, Load — Data Engineering in 5 Days

~1 hour Hands-on Precision AI Academy

Today's Objective

By the end of this lesson, you will understand the core concepts of ETL: Extract, Transform, Load — Data Engineering in 5 Days and be able to apply them in a real project.

python
python
# etl_pipeline.py
import csv
import sqlite3
from datetime import datetime
from pathlib import Path

# Extract
def extract(filepath: str) -> list[dict]:
    with open(filepath, newline='', encoding='utf-8') as f:
        return list(csv.DictReader(f))

# Transform
def transform(records: list[dict]) -> list[dict]:
    clean = []
    for r in records:
        # Skip rows with missing required fields
        if not r.get('email') or not r.get('name'):
            continue
        clean.append({
            'name': r['name'].strip().title(),
            'email': r['email'].strip().lower(),
            'signup_date': datetime.strptime(
                r.get('date', '2026-01-01'), '%Y-%m-%d').date(),
            'active': r.get('status', 'active') == 'active',
        })
    return clean

# Load
def load(records: list[dict], db_path: str) -> int:
    conn = sqlite3.connect(db_path)
    cur = conn.cursor()
    cur.execute('''CREATE TABLE IF NOT EXISTS users (
        id INTEGER PRIMARY KEY AUTOINCREMENT,
        name TEXT, email TEXT UNIQUE, signup_date TEXT, active INTEGER
    )''')
    cur.executemany('''INSERT OR IGNORE INTO users
        (name, email, signup_date, active) VALUES
        (:name, :email, :signup_date, :active)''', records)
    conn.commit()
    count = cur.rowcount
    conn.close()
    return count

if __name__ == '__main__':
    rows = extract('users.csv')
    clean = transform(rows)
    n = load(clean, 'warehouse.db')
    print(f'Loaded {n} new records')
Tip: Always validate your data after loading. Run a COUNT(*) and compare it to your source record count to catch silent failures.
01

Exercise: Build an ETL for Sales Data

  1. Download a sample CSV from Kaggle (sales or e-commerce)
  2. Write extract() to read all rows
  3. Write transform() to clean dates, types, and nulls
  4. Write load() to insert into SQLite
  5. Log rows extracted, rows after cleaning, and rows loaded
02

Day 1 Summary