Unified Testing Convention
Overview
This document defines the unified testing convention used across all tasks in the Ohpen case study. All tasks follow the same structure, conventions, and testing patterns for consistency and maintainability.
Directory Structure
All tasks follow this standardized structure:
tasks/
├── 01_data_ingestion_transformation/
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py # Shared fixtures
│ │ ├── pytest.ini # Pytest configuration
│ │ ├── test_*.py # Test files
│ │ └── utils/ # Test utilities (optional)
│ ├── reports/ # Test reports directory
│ ├── requirements.txt # Runtime dependencies
│ ├── requirements-dev.txt # Test dependencies
│ ├── Makefile # Test commands
│ └── src/ # Source code
│
├── 03_sql/
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py
│ │ ├── pytest.ini
│ │ ├── test_*.py
│ │ └── test_data.sql # Test data (SQL-specific)
│ ├── reports/
│ ├── requirements.txt
│ └── Makefile
│
└── 04_devops_cicd/
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── pytest.ini
│ └── test_*.py
├── reports/
├── requirements.txt
└── Makefile
File Conventions
1. Test Files
Naming: test_*.py
Examples:
test_etl.py- ETL unit teststest_workflow_syntax.py- Workflow syntax teststest_balance_query.py- SQL query teststest_integration.py- Integration tests
Structure:
"""Test description."""
import pytest
@pytest.mark.unit
def test_something():
"""Test description."""
# Test implementation
assert condition
2. Configuration Files
pytest.ini
Location: tasks/{task}/tests/pytest.ini
Required sections:
[pytest]- Main configurationmarkers- Test markers/categoriesaddopts- Default pytest optionstestpaths- Test discovery paths
Standard markers:
unit- Unit testsintegration- Integration testsslow- Slow-running tests- Task-specific markers (e.g.,
workflow,terraform,syntax)
Standard output options:
addopts =
-v
--tb=short
--strict-markers
--html=reports/test_report.html
--self-contained-html
--json-report
--json-report-file=reports/test_report.json
--json-report-summary
conftest.py
Location: tasks/{task}/tests/conftest.py
Purpose:
- Shared pytest fixtures
- Test configuration
- Path setup
Standard fixtures:
- Task-specific fixtures (e.g.,
workflow_file,terraform_content) - Shared utilities
3. Requirements Files
requirements.txt
Location: tasks/{task}/requirements.txt
Contains:
- Runtime dependencies only
- Production dependencies
requirements-dev.txt (optional)
Location: tasks/{task}/requirements-dev.txt
Contains:
- Test dependencies (
pytest,pytest-html, etc.) - Development tools (
ruff,mypy, etc.) - Metrics collection (
psutil, etc.)
4. Makefile
Location: tasks/{task}/Makefile
Standard targets:
.PHONY: test clean help
help:
@echo "Testing Commands:"
@echo " make test - Run all tests"
@echo " make clean - Clean test artifacts"
test:
@echo "🧪 Running tests..."
pytest tests/ -v --tb=short
clean:
@echo "🧹 Cleaning test artifacts..."
find . -type d -name "__pycache__" -exec rm -r {} + 2>/dev/null || true
find . -type f -name "*.pyc" -delete 2>/dev/null || true
find . -type d -name ".pytest_cache" -exec rm -r {} + 2>/dev/null || true
rm -rf reports/ 2>/dev/null || true
Task-specific targets:
- Task 1:
test-pandas,test-spark,test-with-metrics - Task 3:
test-isolated,test-docker - Task 4:
test-workflow,test-terraform,test-integration
Test Report Structure
Reports Directory
Location: tasks/{task}/reports/
Standard reports:
test_report.json- Pytest JSON report (machine-readable)test_report.html- HTML report (human-readable)test_metrics.json- Metrics report (if metrics enabled)TEST_METRICS.md- Metrics summary (if metrics enabled)
Report Generation
All tests automatically generate:
- JSON report - For programmatic analysis
- HTML report - For visual inspection
- Console output - For immediate feedback
Metrics collection (Task 1 only, optional):
- Time metrics (duration, CPU time)
- Memory metrics (RSS, peak, delta)
- System metrics (load, threads, FDs)
- Spark metrics (if applicable)
- S3 metrics (if applicable)
Testing Patterns
1. Unit Tests
Purpose: Test individual functions/modules in isolation
Pattern:
@pytest.mark.unit
def test_function_name():
"""Test description."""
# Arrange
input_data = ...
# Act
result = function_under_test(input_data)
# Assert
assert result == expected
2. Integration Tests
Purpose: Test complete workflows end-to-end
Pattern:
@pytest.mark.integration
def test_full_workflow():
"""Test complete workflow."""
# Setup
with tempfile.TemporaryDirectory() as tmpdir:
# Execute workflow
result = run_workflow(tmpdir)
# Verify outputs
assert result.success
assert output_files_exist(tmpdir)
3. Syntax/Structure Tests
Purpose: Validate configuration files (YAML, SQL, Terraform)
Pattern:
@pytest.mark.syntax
def test_yaml_valid(workflow_file):
"""Test YAML syntax."""
with open(workflow_file, 'r') as f:
yaml.safe_load(f) # Should not raise
4. Validation Tests
Purpose: Verify business logic and validation rules
Pattern:
@pytest.mark.validation
def test_validation_rule():
"""Test validation logic."""
invalid_data = {...}
result = validate(invalid_data)
assert result.has_errors
assert 'expected_error' in result.errors
Running Tests
Standard Commands
All tasks:
cd tasks/{task}
make test # Run all tests
make clean # Clean artifacts
Task-specific:
# Task 1
make test-pandas
make test-spark
make test-with-metrics
# Task 3
make test-docker
make test-isolated
# Task 4
make test-workflow
make test-terraform
make test-integration
Direct Pytest
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_etl.py -v
# Run with markers
pytest tests/ -v -m unit
pytest tests/ -v -m "not slow"
# Run with coverage
pytest tests/ --cov=src --cov-report=html
Test Categories by Task
Task 1: Data Ingestion & Transformation
Test files:
test_etl.py- Pandas ETL unit teststest_etl_spark.py- PySpark ETL unit teststest_integration.py- End-to-end integration teststest_validator.py- Validation logic teststest_metadata.py- Metadata enrichment teststest_loop_prevention.py- Loop prevention teststest_s3_operations.py- S3 operation teststest_load_spark.py- Load/performance teststest_edge_cases_spark.py- Edge case tests
Markers:
unit,integration,slow,load
Task 3: SQL Query
Test files:
test_balance_query.py- SQL query validation tests
Test data:
test_data.sql- Synthetic test dataexpected_output.csv- Expected query results
Markers:
syntax,logic,edge_cases
Task 4: CI/CD
Test files:
test_workflow_syntax.py- Workflow YAML validationtest_terraform.py- Terraform configuration validationtest_workflow_integration.py- Integration consistency tests
Markers:
syntax,structure,workflow,terraform,integration
Best Practices
1. Test Organization
- One test file per module (e.g.,
test_etl.pyforetl.py) - Group related tests in the same file
- Use descriptive test names (
test_validation_invalid_currencynottest_1)
2. Test Isolation
- Each test should be independent (no shared state)
- Use fixtures for setup/teardown
- Clean up resources (temp files, mocks)
3. Test Data
- Use synthetic test data (not production data)
- Keep test data minimal (only what's needed)
- Store test data in
tests/directory
4. Assertions
- Use descriptive assertions with messages
- Test one thing per test (single responsibility)
- Use appropriate assertion types (
assert,pytest.raises, etc.)
5. Markers
- Use markers consistently across tasks
- Mark slow tests with
@pytest.mark.slow - Mark integration tests with
@pytest.mark.integration
6. Documentation
- Document test purpose in docstrings
- Explain complex test logic with comments
- Keep test names descriptive (they serve as documentation)
Metrics Collection (Task 1)
Task 1 includes automatic metrics collection:
Metrics collected:
- Time: duration, CPU time, timestamps
- Memory: RSS, peak, delta, VMS, shared
- CPU: CPU time, CPU percent, user/system time
- System: load average, thread count, open file descriptors
- Spark: executor memory, job/stage metrics (if applicable)
- S3: operation counts, bytes, latency (if applicable)
Usage:
cd tasks/01_data_ingestion_transformation
make test-with-metrics
Reports:
reports/test_metrics.json- Machine-readable metricsreports/TEST_METRICS.md- Human-readable summary
CI/CD Integration
All tasks are tested in GitHub Actions (Task 4 workflow):
Jobs:
python-validation- Task 1 Pandas testspyspark-validation- Task 1 PySpark testssql-validation- Task 3 SQL tests- (Task 4 tests can be added to CI)
Workflow: .github/workflows/ci.yml
Summary
Unified Structure
✅ Same directory structure across all tasks
✅ Same file naming conventions (test_*.py)
✅ Same configuration files (pytest.ini, conftest.py)
✅ Same Makefile patterns (make test, make clean)
✅ Same report structure (reports/test_report.json)
✅ Same test markers (unit, integration, etc.)
Task-Specific Customization
✅ Task-specific test files (e.g., test_workflow_syntax.py)
✅ Task-specific markers (e.g., workflow, terraform)
✅ Task-specific Makefile targets (e.g., test-pandas)
✅ Optional features (e.g., metrics collection in Task 1)
Quick Reference
| Task | Test Command | Key Test Files |
|---|---|---|
| Task 1 | make test | test_etl.py, test_integration.py |
| Task 3 | make test | test_balance_query.py |
| Task 4 | make test | test_workflow_syntax.py, test_terraform.py |
| File | Purpose | Location |
|---|---|---|
pytest.ini | Pytest configuration | tests/pytest.ini |
conftest.py | Shared fixtures | tests/conftest.py |
Makefile | Test commands | Makefile |
requirements.txt | Dependencies | requirements.txt |
test_report.json | Test results | reports/test_report.json |
Last Updated: 2026-01-23