Unified Testing Convention

Overview

This document defines the unified testing convention used across all tasks in the Ohpen case study. All tasks follow the same structure, conventions, and testing patterns for consistency and maintainability.

Directory Structure

All tasks follow this standardized structure:

tasks/
├── 01_data_ingestion_transformation/
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── conftest.py              # Shared fixtures
│   │   ├── pytest.ini               # Pytest configuration
│   │   ├── test_*.py                # Test files
│   │   └── utils/                   # Test utilities (optional)
│   ├── reports/                     # Test reports directory
│   ├── requirements.txt             # Runtime dependencies
│   ├── requirements-dev.txt         # Test dependencies
│   ├── Makefile                     # Test commands
│   └── src/                         # Source code
│
├── 03_sql/
│   ├── tests/
│   │   ├── __init__.py
│   │   ├── conftest.py
│   │   ├── pytest.ini
│   │   ├── test_*.py
│   │   └── test_data.sql            # Test data (SQL-specific)
│   ├── reports/
│   ├── requirements.txt
│   └── Makefile
│
└── 04_devops_cicd/
    ├── tests/
    │   ├── __init__.py
    │   ├── conftest.py
    │   ├── pytest.ini
    │   └── test_*.py
    ├── reports/
    ├── requirements.txt
    └── Makefile

File Conventions

1. Test Files

Naming: test_*.py

Examples:

test_etl.py - ETL unit tests
test_workflow_syntax.py - Workflow syntax tests
test_balance_query.py - SQL query tests
test_integration.py - Integration tests

Structure:

"""Test description."""
import pytest

@pytest.mark.unit
def test_something():
    """Test description."""
    # Test implementation
    assert condition

2. Configuration Files

`pytest.ini`

Location: tasks/{task}/tests/pytest.ini

Required sections:

[pytest] - Main configuration
markers - Test markers/categories
addopts - Default pytest options
testpaths - Test discovery paths

Standard markers:

unit - Unit tests
integration - Integration tests
slow - Slow-running tests
Task-specific markers (e.g., workflow, terraform, syntax)

Standard output options:

addopts = 
    -v
    --tb=short
    --strict-markers
    --html=reports/test_report.html
    --self-contained-html
    --json-report
    --json-report-file=reports/test_report.json
    --json-report-summary

`conftest.py`

Location: tasks/{task}/tests/conftest.py

Purpose:

Shared pytest fixtures
Test configuration
Path setup

Standard fixtures:

Task-specific fixtures (e.g., workflow_file, terraform_content)
Shared utilities

3. Requirements Files

`requirements.txt`

Location: tasks/{task}/requirements.txt

Contains:

Runtime dependencies only
Production dependencies

`requirements-dev.txt` (optional)

Location: tasks/{task}/requirements-dev.txt

Contains:

Test dependencies (pytest, pytest-html, etc.)
Development tools (ruff, mypy, etc.)
Metrics collection (psutil, etc.)

4. Makefile

Location: tasks/{task}/Makefile

Standard targets:

.PHONY: test clean help

help:
	@echo "Testing Commands:"
	@echo "  make test    - Run all tests"
	@echo "  make clean   - Clean test artifacts"

test:
	@echo "🧪 Running tests..."
	pytest tests/ -v --tb=short

clean:
	@echo "🧹 Cleaning test artifacts..."
	find . -type d -name "__pycache__" -exec rm -r {} + 2>/dev/null || true
	find . -type f -name "*.pyc" -delete 2>/dev/null || true
	find . -type d -name ".pytest_cache" -exec rm -r {} + 2>/dev/null || true
	rm -rf reports/ 2>/dev/null || true

Task-specific targets:

Task 1: test-pandas, test-spark, test-with-metrics
Task 3: test-isolated, test-docker
Task 4: test-workflow, test-terraform, test-integration

Test Report Structure

Reports Directory

Location: tasks/{task}/reports/

Standard reports:

test_report.json - Pytest JSON report (machine-readable)
test_report.html - HTML report (human-readable)
test_metrics.json - Metrics report (if metrics enabled)
TEST_METRICS.md - Metrics summary (if metrics enabled)

Report Generation

All tests automatically generate:

JSON report - For programmatic analysis
HTML report - For visual inspection
Console output - For immediate feedback

Metrics collection (Task 1 only, optional):

Time metrics (duration, CPU time)
Memory metrics (RSS, peak, delta)
System metrics (load, threads, FDs)
Spark metrics (if applicable)
S3 metrics (if applicable)

Testing Patterns

1. Unit Tests

Purpose: Test individual functions/modules in isolation

Pattern:

@pytest.mark.unit
def test_function_name():
    """Test description."""
    # Arrange
    input_data = ...
    
    # Act
    result = function_under_test(input_data)
    
    # Assert
    assert result == expected

2. Integration Tests

Purpose: Test complete workflows end-to-end

Pattern:

@pytest.mark.integration
def test_full_workflow():
    """Test complete workflow."""
    # Setup
    with tempfile.TemporaryDirectory() as tmpdir:
        # Execute workflow
        result = run_workflow(tmpdir)
        
        # Verify outputs
        assert result.success
        assert output_files_exist(tmpdir)

3. Syntax/Structure Tests

Purpose: Validate configuration files (YAML, SQL, Terraform)

Pattern:

@pytest.mark.syntax
def test_yaml_valid(workflow_file):
    """Test YAML syntax."""
    with open(workflow_file, 'r') as f:
        yaml.safe_load(f)  # Should not raise

4. Validation Tests

Purpose: Verify business logic and validation rules

Pattern:

@pytest.mark.validation
def test_validation_rule():
    """Test validation logic."""
    invalid_data = {...}
    result = validate(invalid_data)
    assert result.has_errors
    assert 'expected_error' in result.errors

Running Tests

Standard Commands

All tasks:

cd tasks/{task}
make test          # Run all tests
make clean         # Clean artifacts

Task-specific:

# Task 1
make test-pandas
make test-spark
make test-with-metrics

# Task 3
make test-docker
make test-isolated

# Task 4
make test-workflow
make test-terraform
make test-integration

Direct Pytest

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_etl.py -v

# Run with markers
pytest tests/ -v -m unit
pytest tests/ -v -m "not slow"

# Run with coverage
pytest tests/ --cov=src --cov-report=html

Test Categories by Task

Task 1: Data Ingestion & Transformation

Test files:

test_etl.py - Pandas ETL unit tests
test_etl_spark.py - PySpark ETL unit tests
test_integration.py - End-to-end integration tests
test_validator.py - Validation logic tests
test_metadata.py - Metadata enrichment tests
test_loop_prevention.py - Loop prevention tests
test_s3_operations.py - S3 operation tests
test_load_spark.py - Load/performance tests
test_edge_cases_spark.py - Edge case tests

Markers:

unit, integration, slow, load

Task 3: SQL Query

Test files:

test_balance_query.py - SQL query validation tests

Test data:

test_data.sql - Synthetic test data
expected_output.csv - Expected query results

Markers:

syntax, logic, edge_cases

Task 4: CI/CD

Test files:

test_workflow_syntax.py - Workflow YAML validation
test_terraform.py - Terraform configuration validation
test_workflow_integration.py - Integration consistency tests

Markers:

syntax, structure, workflow, terraform, integration

Best Practices

1. Test Organization

One test file per module (e.g., test_etl.py for etl.py)
Group related tests in the same file
Use descriptive test names (test_validation_invalid_currency not test_1)

2. Test Isolation

Each test should be independent (no shared state)
Use fixtures for setup/teardown
Clean up resources (temp files, mocks)

3. Test Data

Use synthetic test data (not production data)
Keep test data minimal (only what's needed)
Store test data in tests/ directory

4. Assertions

Use descriptive assertions with messages
Test one thing per test (single responsibility)
Use appropriate assertion types (assert, pytest.raises, etc.)

5. Markers

Use markers consistently across tasks
Mark slow tests with @pytest.mark.slow
Mark integration tests with @pytest.mark.integration

6. Documentation

Document test purpose in docstrings
Explain complex test logic with comments
Keep test names descriptive (they serve as documentation)

Metrics Collection (Task 1)

Task 1 includes automatic metrics collection:

Metrics collected:

Time: duration, CPU time, timestamps
Memory: RSS, peak, delta, VMS, shared
CPU: CPU time, CPU percent, user/system time
System: load average, thread count, open file descriptors
Spark: executor memory, job/stage metrics (if applicable)
S3: operation counts, bytes, latency (if applicable)

Usage:

cd tasks/01_data_ingestion_transformation
make test-with-metrics

Reports:

reports/test_metrics.json - Machine-readable metrics
reports/TEST_METRICS.md - Human-readable summary

CI/CD Integration

All tasks are tested in GitHub Actions (Task 4 workflow):

Jobs:

python-validation - Task 1 Pandas tests
pyspark-validation - Task 1 PySpark tests
sql-validation - Task 3 SQL tests
(Task 4 tests can be added to CI)

Workflow: .github/workflows/ci.yml

Summary

Unified Structure

✅ Same directory structure across all tasks
✅ Same file naming conventions (test_*.py)
✅ Same configuration files (pytest.ini, conftest.py)
✅ Same Makefile patterns (make test, make clean)
✅ Same report structure (reports/test_report.json)
✅ Same test markers (unit, integration, etc.)

Task-Specific Customization

✅ Task-specific test files (e.g., test_workflow_syntax.py)
✅ Task-specific markers (e.g., workflow, terraform)
✅ Task-specific Makefile targets (e.g., test-pandas)
✅ Optional features (e.g., metrics collection in Task 1)

Quick Reference

Task	Test Command	Key Test Files
Task 1	`make test`	`test_etl.py`, `test_integration.py`
Task 3	`make test`	`test_balance_query.py`
Task 4	`make test`	`test_workflow_syntax.py`, `test_terraform.py`

File	Purpose	Location
`pytest.ini`	Pytest configuration	`tests/pytest.ini`
`conftest.py`	Shared fixtures	`tests/conftest.py`
`Makefile`	Test commands	`Makefile`
`requirements.txt`	Dependencies	`requirements.txt`
`test_report.json`	Test results	`reports/test_report.json`

Last Updated: 2026-01-23

Overview​

Directory Structure​

File Conventions​

1. Test Files​

2. Configuration Files​

pytest.ini​

conftest.py​

3. Requirements Files​

requirements.txt​

requirements-dev.txt (optional)​

4. Makefile​

Test Report Structure​

Reports Directory​

Report Generation​

Testing Patterns​

1. Unit Tests​

2. Integration Tests​

3. Syntax/Structure Tests​

4. Validation Tests​

Running Tests​

Standard Commands​

Direct Pytest​

Test Categories by Task​

Task 1: Data Ingestion & Transformation​

Task 3: SQL Query​

Task 4: CI/CD​

Best Practices​

1. Test Organization​

2. Test Isolation​

3. Test Data​

4. Assertions​

5. Markers​

6. Documentation​

Metrics Collection (Task 1)​

CI/CD Integration​

Summary​

Unified Structure​

Task-Specific Customization​

Quick Reference​

Overview

Directory Structure

File Conventions

1. Test Files

2. Configuration Files

`pytest.ini`

`conftest.py`

3. Requirements Files

`requirements.txt`

`requirements-dev.txt` (optional)

4. Makefile

Test Report Structure

Reports Directory

Report Generation

Testing Patterns

1. Unit Tests

2. Integration Tests

3. Syntax/Structure Tests

4. Validation Tests

Running Tests

Standard Commands

Direct Pytest

Test Categories by Task

Task 1: Data Ingestion & Transformation

Task 3: SQL Query

Task 4: CI/CD

Best Practices

1. Test Organization

2. Test Isolation

3. Test Data

4. Assertions

5. Markers

6. Documentation

Metrics Collection (Task 1)

CI/CD Integration

Summary

Unified Structure

Task-Specific Customization

Quick Reference