SQL Script Isolated Testing
Overview
The SQL query (balance_history_2024_q1.sql) is tested in complete isolation using DuckDB, an in-memory SQL database. This approach:
✅ No Data Lake Connection Required - Tests run entirely locally
✅ Fast Execution - In-memory database, no network overhead
✅ Reproducible - Same test data every time
✅ Portable - Works on any machine with Python
✅ CI/CD Friendly - No AWS credentials or infrastructure needed
Test Architecture
┌─────────────────────────────────────────┐
│ Test Runner (pytest) │
│ │
│ ┌──────────────────────────────────┐ │
│ │ DuckDB (In-Memory Database) │ │
│ │ - schema.sql (table definition) │ │
│ │ - test_data.sql (sample data) │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ balance_history_2024_q1.sql │ │
│ │ (query under test) │ │
│ └──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────┐ │
│ │ expected_output.csv │ │
│ │ (validation) │ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────┘
Test Data
The test uses synthetic test data from tests/test_data.sql:
- 3 accounts: ACC001, ACC002, ACC003
- 20 transactions across 2024
- Covers edge cases:
- Multiple transactions per month
- Missing months (NULL balances)
- Different account activity patterns
Running Tests
Quick Start (Docker - Recommended)
# Run tests in Docker (no setup required!)
make test
This will:
- Build Docker image with all dependencies
- Run tests in isolated container
- Clean up automatically
Manual Execution (Local)
# Option 1: Use Makefile with virtual environment
make test-isolated
# Option 2: Manual setup
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pytest tests/test_balance_query.py -v
Test Coverage
✅ Syntax Validation
- Validates SQL syntax is correct
- Adapts query for DuckDB (TO_CHAR → STRFTIME)
✅ Logic Validation
- Verifies month-end balance calculation
- Tests last transaction selection within month
- Validates account×month spine generation
✅ Edge Cases
- Missing months: Accounts with no transactions return NULL
- Multiple transactions: Picks last transaction of month
- Tie-breaking: Uses
idas tie-breaker when dates are equal
✅ Expected Output
- Compares results against
expected_output.csv - Validates all accounts and months
- Checks NULL handling
Test Files
| File | Purpose |
|---|---|
tests/test_balance_query.py | Test suite (pytest) |
tests/test_data.sql | Synthetic test data |
tests/expected_output.csv | Expected query results |
schema.sql | Table schema definition |
balance_history_2024_q1.sql | Query under test |
Why Isolation?
Problem: Data Lake Dependency
- Silver layer data requires:
- AWS credentials
- S3 access
- Parquet file reading
- Partition discovery
- Complex setup
Solution: In-Memory Testing
- DuckDB: Fast, in-memory SQL engine
- Test Data: Synthetic data that mirrors real structure
- No Infrastructure: Runs anywhere Python runs
- Fast: Milliseconds instead of seconds/minutes
Test Data Structure
The test data mirrors the Silver layer structure:
CREATE TABLE transactions (
id INT PRIMARY KEY,
account_id VARCHAR(100),
amount NUMERIC(16,2),
new_balance NUMERIC(16,2),
tx_date TIMESTAMP
);
Query Adaptations
The SQL query uses standard ANSI SQL, but DuckDB requires minor adaptations:
-- Original (Presto/Athena)
TO_CHAR(am.month_end, 'YYYY-MM')
-- DuckDB adaptation
STRFTIME(am.month_end, '%Y-%m')
These adaptations are handled automatically in the test suite.
Integration with CI/CD
The isolated tests integrate seamlessly with CI/CD:
Option 1: Docker (Recommended)
# .github/workflows/ci.yml
sql-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Run SQL tests in Docker
run: |
cd tasks/03_sql
make test
Option 2: Direct Python
sql-validation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Install dependencies
run: pip install -r tasks/03_sql/requirements.txt
- name: Run SQL tests
run: |
cd tasks/03_sql
pytest tests/ -v
No AWS credentials needed! ✅
Dockerized for consistency! ✅
Benefits
- Fast Feedback - Tests run in milliseconds
- No Dependencies - No AWS, S3, or data lake setup
- Reproducible - Same results every time
- Portable - Works on any machine
- CI/CD Ready - No infrastructure setup required
- Cost Effective - No AWS charges for testing
Limitations
⚠️ Note: These tests validate SQL logic, not data lake integration.
For integration testing with actual Silver layer data:
- Requires AWS credentials
- Needs S3 access to Silver layer
- Requires Parquet file reading
- More complex setup
Recommendation: Use isolated tests for development and CI/CD. Use integration tests only for final validation before deployment.
Example Test Output
tests/test_balance_query.py::test_balance_query_syntax_valid PASSED
tests/test_balance_query.py::test_balance_query_returns_expected_results PASSED
tests/test_balance_query.py::test_query_handles_missing_months PASSED
tests/test_balance_query.py::test_query_picks_last_transaction_of_month PASSED
======================== 4 passed in 0.15s ========================
Summary
✅ SQL tests run in complete isolation
✅ No data lake connection required
✅ Fast, reproducible, portable
✅ Perfect for CI/CD integration
The SQL query logic is thoroughly tested without any external dependencies!
Related Documentation
Task 3 Documentation
- Testing Documentation - SQL testing overview
- SQL Breakdown - Query structure explanation with high-level algorithm
- SQL Breakdown & Flow - Visual representation and breakdown of query flow
- SQL Query - The actual SQL file
Related Tasks
- Data Lake Architecture - Architecture this query runs on
- ETL Pipeline - What creates the data this query reads
Technical Documentation
- Testing Guide - Comprehensive testing documentation
- Unified Testing Convention - Testing standards across all tasks