Day 3 of 5

dbt: Transform Data in the Warehouse

dbt (data build tool) lets you write SQL SELECT statements and automatically handles materialisation, documentation, and testing. It turns your warehouse into a reliable transformation layer.

bash
pip install dbt-core dbt-duckdb   # or dbt-bigquery, dbt-snowflake
dbt init my_project
cd my_project
dbt debug     # verify connection
sql
-- models/staging/stg_orders.sql
-- Staging model: clean and rename raw columns
{{ config(materialized='view') }}

SELECT
    id::VARCHAR AS order_id,
    customer_id,
    LOWER(status) AS status,
    created_at::DATE AS order_date,
    total_amount::DECIMAL(10,2) AS amount
FROM {{ source('raw', 'orders') }}
WHERE id IS NOT NULL
sql
-- models/marts/orders_daily.sql
-- Mart model: business-level aggregate
{{ config(materialized='table') }}

WITH orders AS (
    SELECT * FROM {{ ref('stg_orders') }}
    WHERE status = 'completed'
),
daily AS (
    SELECT
        order_date,
        COUNT(*) AS order_count,
        SUM(amount) AS revenue,
        AVG(amount) AS avg_order_value
    FROM orders
    GROUP BY order_date
)
SELECT * FROM daily
ORDER BY order_date
yaml
# models/staging/schema.yml — documentation and tests
version: 2

models:
  - name: stg_orders
    description: "Cleaned orders from the raw source"
    columns:
      - name: order_id
        description: "Unique order ID"
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['pending', 'completed', 'cancelled']
Tip: Run dbt run && dbt test to build all models and run all tests in one command.

Exercise: Build Your First dbt Project

  1. Install dbt-duckdb and create a project
  2. Add a sources.yml that points to a CSV file
  3. Write a staging model that cleans the raw data
  4. Write a marts model that aggregates the staged data
  5. Add unique and not_null tests to every primary key

Day 3 Summary

Finished this lesson?