Skip to main content

Unified Testing Convention

Overview

This document defines the unified testing convention used across all tasks in the Ohpen case study. All tasks follow the same structure, conventions, and testing patterns for consistency and maintainability.


Directory Structure

All tasks follow this standardized structure:

tasks/
├── 01_data_ingestion_transformation/
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py # Shared fixtures
│ │ ├── pytest.ini # Pytest configuration
│ │ ├── test_*.py # Test files
│ │ └── utils/ # Test utilities (optional)
│ ├── reports/ # Test reports directory
│ ├── requirements.txt # Runtime dependencies
│ ├── requirements-dev.txt # Test dependencies
│ ├── Makefile # Test commands
│ └── src/ # Source code

├── 03_sql/
│ ├── tests/
│ │ ├── __init__.py
│ │ ├── conftest.py
│ │ ├── pytest.ini
│ │ ├── test_*.py
│ │ └── test_data.sql # Test data (SQL-specific)
│ ├── reports/
│ ├── requirements.txt
│ └── Makefile

└── 04_devops_cicd/
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── pytest.ini
│ └── test_*.py
├── reports/
├── requirements.txt
└── Makefile

File Conventions

1. Test Files

Naming: test_*.py

Examples:

  • test_etl.py - ETL unit tests
  • test_workflow_syntax.py - Workflow syntax tests
  • test_balance_query.py - SQL query tests
  • test_integration.py - Integration tests

Structure:

"""Test description."""
import pytest

@pytest.mark.unit
def test_something():
"""Test description."""
# Test implementation
assert condition

2. Configuration Files

pytest.ini

Location: tasks/{task}/tests/pytest.ini

Required sections:

  • [pytest] - Main configuration
  • markers - Test markers/categories
  • addopts - Default pytest options
  • testpaths - Test discovery paths

Standard markers:

  • unit - Unit tests
  • integration - Integration tests
  • slow - Slow-running tests
  • Task-specific markers (e.g., workflow, terraform, syntax)

Standard output options:

addopts = 
-v
--tb=short
--strict-markers
--html=reports/test_report.html
--self-contained-html
--json-report
--json-report-file=reports/test_report.json
--json-report-summary

conftest.py

Location: tasks/{task}/tests/conftest.py

Purpose:

  • Shared pytest fixtures
  • Test configuration
  • Path setup

Standard fixtures:

  • Task-specific fixtures (e.g., workflow_file, terraform_content)
  • Shared utilities

3. Requirements Files

requirements.txt

Location: tasks/{task}/requirements.txt

Contains:

  • Runtime dependencies only
  • Production dependencies

requirements-dev.txt (optional)

Location: tasks/{task}/requirements-dev.txt

Contains:

  • Test dependencies (pytest, pytest-html, etc.)
  • Development tools (ruff, mypy, etc.)
  • Metrics collection (psutil, etc.)

4. Makefile

Location: tasks/{task}/Makefile

Standard targets:

.PHONY: test clean help

help:
@echo "Testing Commands:"
@echo " make test - Run all tests"
@echo " make clean - Clean test artifacts"

test:
@echo "🧪 Running tests..."
pytest tests/ -v --tb=short

clean:
@echo "🧹 Cleaning test artifacts..."
find . -type d -name "__pycache__" -exec rm -r {} + 2>/dev/null || true
find . -type f -name "*.pyc" -delete 2>/dev/null || true
find . -type d -name ".pytest_cache" -exec rm -r {} + 2>/dev/null || true
rm -rf reports/ 2>/dev/null || true

Task-specific targets:

  • Task 1: test-pandas, test-spark, test-with-metrics
  • Task 3: test-isolated, test-docker
  • Task 4: test-workflow, test-terraform, test-integration

Test Report Structure

Reports Directory

Location: tasks/{task}/reports/

Standard reports:

  • test_report.json - Pytest JSON report (machine-readable)
  • test_report.html - HTML report (human-readable)
  • test_metrics.json - Metrics report (if metrics enabled)
  • TEST_METRICS.md - Metrics summary (if metrics enabled)

Report Generation

All tests automatically generate:

  1. JSON report - For programmatic analysis
  2. HTML report - For visual inspection
  3. Console output - For immediate feedback

Metrics collection (Task 1 only, optional):

  • Time metrics (duration, CPU time)
  • Memory metrics (RSS, peak, delta)
  • System metrics (load, threads, FDs)
  • Spark metrics (if applicable)
  • S3 metrics (if applicable)

Testing Patterns

1. Unit Tests

Purpose: Test individual functions/modules in isolation

Pattern:

@pytest.mark.unit
def test_function_name():
"""Test description."""
# Arrange
input_data = ...

# Act
result = function_under_test(input_data)

# Assert
assert result == expected

2. Integration Tests

Purpose: Test complete workflows end-to-end

Pattern:

@pytest.mark.integration
def test_full_workflow():
"""Test complete workflow."""
# Setup
with tempfile.TemporaryDirectory() as tmpdir:
# Execute workflow
result = run_workflow(tmpdir)

# Verify outputs
assert result.success
assert output_files_exist(tmpdir)

3. Syntax/Structure Tests

Purpose: Validate configuration files (YAML, SQL, Terraform)

Pattern:

@pytest.mark.syntax
def test_yaml_valid(workflow_file):
"""Test YAML syntax."""
with open(workflow_file, 'r') as f:
yaml.safe_load(f) # Should not raise

4. Validation Tests

Purpose: Verify business logic and validation rules

Pattern:

@pytest.mark.validation
def test_validation_rule():
"""Test validation logic."""
invalid_data = {...}
result = validate(invalid_data)
assert result.has_errors
assert 'expected_error' in result.errors

Running Tests

Standard Commands

All tasks:

cd tasks/{task}
make test # Run all tests
make clean # Clean artifacts

Task-specific:

# Task 1
make test-pandas
make test-spark
make test-with-metrics

# Task 3
make test-docker
make test-isolated

# Task 4
make test-workflow
make test-terraform
make test-integration

Direct Pytest

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_etl.py -v

# Run with markers
pytest tests/ -v -m unit
pytest tests/ -v -m "not slow"

# Run with coverage
pytest tests/ --cov=src --cov-report=html

Test Categories by Task

Task 1: Data Ingestion & Transformation

Test files:

  • test_etl.py - Pandas ETL unit tests
  • test_etl_spark.py - PySpark ETL unit tests
  • test_integration.py - End-to-end integration tests
  • test_validator.py - Validation logic tests
  • test_metadata.py - Metadata enrichment tests
  • test_loop_prevention.py - Loop prevention tests
  • test_s3_operations.py - S3 operation tests
  • test_load_spark.py - Load/performance tests
  • test_edge_cases_spark.py - Edge case tests

Markers:

  • unit, integration, slow, load

Task 3: SQL Query

Test files:

  • test_balance_query.py - SQL query validation tests

Test data:

  • test_data.sql - Synthetic test data
  • expected_output.csv - Expected query results

Markers:

  • syntax, logic, edge_cases

Task 4: CI/CD

Test files:

  • test_workflow_syntax.py - Workflow YAML validation
  • test_terraform.py - Terraform configuration validation
  • test_workflow_integration.py - Integration consistency tests

Markers:

  • syntax, structure, workflow, terraform, integration

Best Practices

1. Test Organization

  • One test file per module (e.g., test_etl.py for etl.py)
  • Group related tests in the same file
  • Use descriptive test names (test_validation_invalid_currency not test_1)

2. Test Isolation

  • Each test should be independent (no shared state)
  • Use fixtures for setup/teardown
  • Clean up resources (temp files, mocks)

3. Test Data

  • Use synthetic test data (not production data)
  • Keep test data minimal (only what's needed)
  • Store test data in tests/ directory

4. Assertions

  • Use descriptive assertions with messages
  • Test one thing per test (single responsibility)
  • Use appropriate assertion types (assert, pytest.raises, etc.)

5. Markers

  • Use markers consistently across tasks
  • Mark slow tests with @pytest.mark.slow
  • Mark integration tests with @pytest.mark.integration

6. Documentation

  • Document test purpose in docstrings
  • Explain complex test logic with comments
  • Keep test names descriptive (they serve as documentation)

Metrics Collection (Task 1)

Task 1 includes automatic metrics collection:

Metrics collected:

  • Time: duration, CPU time, timestamps
  • Memory: RSS, peak, delta, VMS, shared
  • CPU: CPU time, CPU percent, user/system time
  • System: load average, thread count, open file descriptors
  • Spark: executor memory, job/stage metrics (if applicable)
  • S3: operation counts, bytes, latency (if applicable)

Usage:

cd tasks/01_data_ingestion_transformation
make test-with-metrics

Reports:

  • reports/test_metrics.json - Machine-readable metrics
  • reports/TEST_METRICS.md - Human-readable summary

CI/CD Integration

All tasks are tested in GitHub Actions (Task 4 workflow):

Jobs:

  • python-validation - Task 1 Pandas tests
  • pyspark-validation - Task 1 PySpark tests
  • sql-validation - Task 3 SQL tests
  • (Task 4 tests can be added to CI)

Workflow: .github/workflows/ci.yml


Summary

Unified Structure

Same directory structure across all tasks
Same file naming conventions (test_*.py)
Same configuration files (pytest.ini, conftest.py)
Same Makefile patterns (make test, make clean)
Same report structure (reports/test_report.json)
Same test markers (unit, integration, etc.)

Task-Specific Customization

Task-specific test files (e.g., test_workflow_syntax.py)
Task-specific markers (e.g., workflow, terraform)
Task-specific Makefile targets (e.g., test-pandas)
Optional features (e.g., metrics collection in Task 1)


Quick Reference

TaskTest CommandKey Test Files
Task 1make testtest_etl.py, test_integration.py
Task 3make testtest_balance_query.py
Task 4make testtest_workflow_syntax.py, test_terraform.py
FilePurposeLocation
pytest.iniPytest configurationtests/pytest.ini
conftest.pyShared fixturestests/conftest.py
MakefileTest commandsMakefile
requirements.txtDependenciesrequirements.txt
test_report.jsonTest resultsreports/test_report.json

Last Updated: 2026-01-23

© 2026 Stephen AdeiCC BY 4.0