Testing & Validation Guide
This document describes the comprehensive test suite for validating both Python and SQL code in the Ohpen case study implementation.
Overview
Python Tests (Task 1 - ETL)
Location
tasks/01_data_ingestion_transformation/tests/
Test Files
-
test_etl.py- Unit tests for validation logic- Valid data handling
- Invalid currency detection
- Null value handling
- Malformed timestamp detection
- Partition column generation
- Run ID in quarantine
- Empty DataFrame edge case
- All rows quarantined edge case
- Deterministic behavior
-
test_integration.py- End-to-end integration tests- Full ETL workflow with temp files
- Parquet partition structure validation
- Success marker validation
- Quarantine data validation
- Deterministic output verification
Running Python Tests
cd /home/stephen/projects/ohpen-case-2026/tasks/01_data_ingestion_transformation
# Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt
# Run linting
ruff check src/ tests/
# Run all tests
export PYTHONPATH=$PYTHONPATH:$(pwd)
pytest tests/ -v
# Run specific test file
pytest tests/test_etl.py -v
pytest tests/test_integration.py -v
Script Tests (Workspace & Task Scripts)
Location
tests/scripts/- Tests for workspace-level scriptstasks/05_communication_documentation/tests/- Tests for task-level scripts
Test Files
-
tests/scripts/test_create_submission_bundles.py- Tests for submission bundle creation script- Script existence and permissions
- Bash syntax validation
- Directory structure creation
- File copying logic
- ZIP file generation
- Error handling
-
tasks/05_communication_documentation/tests/test_compile_deliverables.py- Tests for PDF compilation script- Script syntax validation
- File reference checks
- Pandoc integration
- Error handling
Running Script Tests
# From project root
cd /home/stephen/projects/ohpen-case-2026
# Run all script tests
pytest tests/scripts/ tasks/05_communication_documentation/tests/test_compile_deliverables.py -v
# Or use Makefile
make test-scripts
# Run specific test
pytest tests/scripts/test_create_submission_bundles.py -v
SQL Tests (Task 3 - Balance Query)
Location (2)
tasks/03_sql/tests/
Test Files (2)
-
test_balance_query.py- DuckDB-based SQL tests- SQL syntax validation
- Expected results verification
- Missing months handling (NULL values)
- Last transaction of month selection
-
test_data.sql- Sample data from Appendix A -
expected_output.csv- Expected query results
Running SQL Tests
cd /home/stephen/projects/ohpen-case-2026/tasks/03_sql
# Install dependencies (2)
pip install -r requirements.txt
# Run SQL linting
sqlfluff lint balance_history_2024_q1.sql --dialect postgres
# Run SQL unit tests
pytest tests/ -v
CI/CD Integration
The GitHub Actions workflow (.github/workflows/ci.yml) automatically runs all tests:
Jobs
-
python-validation- Ruff linting on source and test files
- All pytest unit and integration tests
-
sql-validation- SQLFluff linting on SQL queries
- DuckDB-based SQL unit tests
Triggering CI
git add .
git commit -m "Add comprehensive test suite"
git push origin master
```text
## Test Coverage
### Python ETL Coverage
- ✅ Validation logic (nulls, currency, timestamp)
- ✅ Partition generation (year/month extraction)
- ✅ Quarantine mechanism
- ✅ Success marker creation
- ✅ Edge cases (empty, all invalid)
- ✅ Deterministic behavior (backfill safety)
- ✅ End-to-end workflow
- ✅ Metadata enrichment (row_hash, attempt_count)
- ✅ Loop prevention (duplicate detection, attempt limits)
- ✅ Circuit breaker logic
- ✅ Condemned layer handling
### Script Coverage
- ✅ Bash script syntax validation
- ✅ Script existence and permissions
- ✅ Directory creation logic
- ✅ File copying operations
- ✅ Error handling
- ✅ Output validation
### SQL Query Coverage
- ✅ Syntax validation (PostgreSQL/ANSI dialect)
- ✅ Expected output verification
- ✅ NULL handling for missing months
- ✅ Last transaction selection logic
- ✅ Cross join (account × month spine)
## Testing Strategy Decisions
### Why DuckDB instead of real PostgreSQL/Aurora
- **Speed**: DuckDB runs in-memory, tests complete in milliseconds
- **Portability**: No external dependencies, works in any CI environment
- **Sufficient**: Validates SQL syntax and logic (the case study requirement)
- **Pragmatic**: Avoids over-engineering for a case study submission
### Why Temp Files instead of Testcontainers for Python
For this case study, we use temporary directories + mocking instead of Testcontainers (Docker-in-Docker with real S3).
## Rationale
- **Fast**: Tests run in <2 seconds vs 10+ seconds with container startup
- **Simple**: Fewer dependencies, easier to debug
- **Validates core logic**: Partition generation, validation, quarantine handling
- **Interview scope**: Demonstrates testing best practices without over-engineering
## When to use Testcontainers (production context)
- Testing S3-specific features (multipart uploads, versioning, lifecycle policies)
- Integration tests requiring exact AWS SDK behavior
- End-to-end workflows with Glue, Lambda, Step Functions
- Testing against multiple database engines simultaneously
**Trade-off acknowledged**: Our tests validate transformation logic but not S3 API edge cases. For production deployment at Ohpen, we would add Testcontainers-based integration tests as a second layer.
## Benefits for Interview
This test suite demonstrates:
1. **Production Readiness**: Comprehensive validation before deployment
2. **Data Quality**: Automated checks prevent data corruption
3. **Backfill Safety**: Determinism tests ensure reproducible outputs
4. **Best Practices**: Linting, unit tests, integration tests, CI/CD
5. **Documentation**: Tests serve as executable examples
6. **Pragmatic Engineering**: Chose the right tool for the scope (DuckDB, temp files) while knowing when to upgrade (Testcontainers for production)
## Quick Validation
### Option 1: Docker (Recommended)
Run all tests in isolated Docker environment:
```bash
cd /home/stephen/projects/ohpen-case-2026
# Run all tests (Python + SQL)
make test
# Or run specific tests
make test-python # Python ETL tests only
make test-sql # SQL query tests only
make test-scripts # Script tests only (runs locally, not in Docker)
```text
## Option 2: Local Environment
If you prefer to run tests locally:
```bash
cd /home/stephen/projects/ohpen-case-2026
# Python tests
cd tasks/01_data_ingestion_transformation
pip install -r requirements.txt -r requirements-dev.txt
ruff check src/ tests/
pytest tests/ -v
# SQL tests
cd ../03_sql
pip install -r requirements.txt
sqlfluff lint balance_history_2024_q1.sql --dialect postgres
pytest tests/ -v
# Script tests
cd ../..
pip install pytest
pytest tests/scripts/ tasks/05_communication_documentation/tests/test_compile_deliverables.py -v
Expected output: All tests passing ✅