Skip to main content

Appendix: CI/CD Complete Reference

This appendix contains all CI/CD-related reference materials: testing guides and workflow details. For the main workflow documentation, see CI/CD Workflow.


Table of Contents


Part 1: CI/CD Testing Guide

id: APPENDIX_H_CI_CD_TESTING title: Appendix H - CI/CD Testing Guide sidebar_position: 8

Overview

Testing CI/CD workflows is highly practical and recommended! This guide covers multiple approaches to test your GitHub Actions workflows.

Why Test CI/CD?

  • Catch issues early - Find workflow problems before pushing to GitHub
  • Faster feedback - Test locally without waiting for GitHub Actions
  • Cost savings - Avoid consuming GitHub Actions minutes during development
  • Confidence - Ensure workflows work correctly before deployment

Testing Approaches

act is a tool that runs GitHub Actions workflows locally using Docker.

Installation

# Linux
curl https://raw.githubusercontent.com/nektos/act/master/install.sh | sudo bash

# macOS
brew install act

# Windows (with Chocolatey)
choco install act-cli

Prerequisites: Docker must be installed and running.

Quick Start

# List all workflows and jobs
cd tasks/04_devops_cicd
act -l

# Run all workflows (push event)
act push

# Run specific job
act push -j python-validation

# Run with verbose output
act push --verbose

Using the Test Script

We provide a convenient test script:

# Make script executable
chmod +x tasks/04_devops_cicd/scripts/test_ci_workflow.sh

# Run the test script
./tasks/04_devops_cicd/scripts/test_ci_workflow.sh

The script will:

  1. Check if act is installed
  2. Check if Docker is running
  3. List available jobs
  4. Let you choose which job to test

Limitations

  • Secrets: Must be provided manually (use act secrets)
  • AWS Services: Cannot test AWS-specific steps (S3, Glue) without credentials
  • GitHub API: Some actions require GitHub API access

Example: Testing with Secrets

# Create secrets file
cat > .secrets <<EOF
AWS_ACCESS_KEY_ID=test
AWS_SECRET_ACCESS_KEY=test
EOF

# Run with secrets
act push --secret-file .secrets

2. Workflow Syntax Validation

Validate YAML syntax and workflow structure:

# Using actionlint (recommended)
chmod +x tasks/04_devops_cicd/scripts/validate_workflow_syntax.sh
./tasks/04_devops_cicd/scripts/validate_workflow_syntax.sh

Or manually:

# Install actionlint
# Linux: https://github.com/rhymond/actionlint#installation
# macOS: brew install actionlint

# Validate workflow
actionlint tasks/04_devops_cicd/.github/workflows/ci.yml

3. Manual Testing on GitHub

For full integration testing, push to a test branch:

# Create test branch
git checkout -b test/ci-workflow

# Make a small change (e.g., add a comment)
echo "# Test CI" >> tasks/04_devops_cicd/.github/workflows/ci.yml

# Commit and push
git add .
git commit -m "test: CI workflow"
git push origin test/ci-workflow

# Check GitHub Actions tab for results

Cleanup:

git checkout main
git branch -D test/ci-workflow
git push origin --delete test/ci-workflow

4. Unit Testing Workflow Steps

Test individual workflow steps in isolation:

Example: Test Python Setup

# Test Python setup step locally
python3 -m venv test-venv
source test-venv/bin/activate
pip install --upgrade pip
pip install -r tasks/01_data_ingestion_transformation/requirements.txt
pip install -r tasks/01_data_ingestion_transformation/requirements-dev.txt

# Test linting
cd tasks/01_data_ingestion_transformation
ruff check src/ tests/

# Test unit tests
pytest tests/test_etl.py tests/test_integration.py -v

Example: Test PySpark Setup

# Test Java setup
java -version # Should be Java 17

# Test PySpark installation
python3 -c "import pyspark; print(pyspark.__version__)"

# Test PySpark tests
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
pytest tests/test_etl_spark.py -v

Testing Strategy

What to Test

  1. Workflow Syntax - YAML is valid
  2. Job Dependencies - Jobs run in correct order
  3. Step Execution - Each step completes successfully
  4. Environment Setup - Python, Java, dependencies install correctly
  5. Test Execution - Unit tests run and pass
  6. Linting - Code style checks pass
  7. ⚠️ AWS Integration - Requires AWS credentials (test separately)

What's Hard to Test Locally

  • AWS Services (S3, Glue, Step Functions) - Require AWS credentials
  • GitHub API - Some actions need GitHub API access
  • Secrets Management - Must be provided manually
  • Matrix Builds - Can be slow locally

CI/CD Test Checklist

Before merging CI/CD changes:

  • Workflow YAML syntax is valid
  • All jobs can run locally (with act)
  • Python setup works (Python 3.10)
  • Java setup works (Java 17)
  • Dependencies install correctly
  • Linting passes (ruff, sqlfluff)
  • Unit tests pass (pytest)
  • Workflow runs on test branch
  • No secrets exposed in logs

Troubleshooting

act Issues

Problem: act can't find Docker

# Solution: Ensure Docker is running
docker info

Problem: act uses wrong image size

# Solution: Select image size on first run
act push
# Choose: micro, medium, or large

Problem: Workflow fails with "secrets not found"

# Solution: Provide secrets manually
act push --secret-file .secrets

Workflow Issues

Problem: Tests fail locally but pass on GitHub

  • Check Python version (should be 3.10)
  • Check Java version (should be 17)
  • Verify dependencies match requirements.txt

Problem: Linting fails

# Run linting manually to see errors
cd tasks/01_data_ingestion_transformation
ruff check src/ tests/

Continuous Improvement

Add Workflow Tests to CI

You can even test your CI/CD workflows in CI! Add a workflow validation step:

# .github/workflows/validate-workflows.yml
name: Validate Workflows
on:
pull_request:
paths:

- '.github/workflows/**'

jobs:
validate:
runs-on: ubuntu-latest
steps:

- uses: actions/checkout@v3
- name: Validate workflow syntax

uses: schema-tools/actionlint@v1
with:
files: '.github/workflows/*.yml'

Resources

Summary

CI/CD testing is practical and recommended!

  • Use act for local testing
  • Validate syntax with actionlint
  • Test individual steps manually
  • Push to test branch for full integration
  • Test AWS-specific steps separately

Time Investment: ~10 minutes to set up, saves hours of debugging later!

Task 4 Documentation

Technical Documentation


Part 2: CI/CD Workflow Details

This section contains detailed information referenced in the CI/CD Workflow document.


id: APPENDIX_J_CI_CD_WORKFLOW_DETAILS title: Appendix J - CI/CD Workflow Details sidebar_position: 10

This appendix contains detailed information referenced in the CI/CD Workflow document.

Appendix A: Failure Scenarios

Critical Rule: Failed runs never update _LATEST.json or current/ prefix.

Failure Types:

  1. ETL Job Failure: Non-zero exit, no _SUCCESS, no data written → Alert triggers, safe rerun
  2. Partial Write: Job crashes mid-execution → Partial files ignored, new run_id on rerun
  3. Validation Failure: Quarantine rate > threshold → Data Quality Team reviews, fixes source, reruns
  4. Circuit Breaker: >100 same errors/hour → Pipeline halts, Platform Team investigates
  5. Schema Validation: Schema drift detected → Fail fast, update schema registry, rerun

Safe Rerun: Each rerun uses new run_id, failed runs preserved for audit, only successful runs promoted.

Promotion Workflow: ETL writes to isolated run_id path → _SUCCESS marker → CloudWatch alarm → Human review (Domain Analyst + Platform Team) → Approval → Promote to production.

Appendix B: Infrastructure Details

Step Functions Orchestration:

  • RunETL State: Invokes Glue job synchronously, auto-retries (≤3 attempts, exponential backoff)
  • ValidateOutput State: Checks _SUCCESS marker, retries on eventual consistency
  • Error Handling: Catches failures, publishes CloudWatch metrics, logs execution details

IAM Prefix-Scoped Permissions:

  • ETL Job: bronze/* (read), silver/* (write), quarantine/* (write)
  • Platform Team: bronze/*, silver/*, quarantine/* (read/write)
  • Domain Teams: silver/{domain}/* (write), gold/{domain}/* (read)
  • Business/Analysts: gold/* (read-only via Athena)
  • Compliance: bronze/*, quarantine/* (read-only for audit)

Appendix C: Monitoring Details

Volume Metrics: run_id, input_rows, valid_rows_count, quarantined_rows_count, condemned_rows_count

Quality Metrics: quarantine_rate, validation_failure_rate, error_type_distribution

Loop Prevention: avg_attempt_count, duplicate_detection_rate, auto_condemnation_rate, circuit_breaker_triggers

Performance: rows_processed_per_run, duration_seconds, missing_partitions, runtime_anomalies

Alert Ownership:

  • P1 (Immediate): Job failures, infrastructure errors, circuit breaker, SLA breaches → Data Platform Team
  • P2 (2-4 hours): Quarantine spikes, validation failures, high attempt counts → Data Quality Team
  • P3 (8 hours): Volume anomalies → Domain Teams

Appendix D: Governance Details

Ownership Matrix (abbreviated):

  • Pipeline/CI/CD/Infrastructure: Data Platform Team
  • Validation Rules: Domain Teams (Silver) / Business (Gold)
  • Data Quality: Data Quality Team
  • Schema: Domain Teams (Silver) / Business (Gold) approve; Platform implements
  • Backfill: Platform executes; Domain/Business approves

Governance Workflows:

  • Schema Change: Request → Layer-based review (Domain/Business) → Platform feasibility → Approval → Implementation → Versioning → Validation → Promotion
  • Quality Issue: Alert → Data Quality triage → Source/Validation/Platform issue → Fix → Backfill approval → Reprocess → Validate → Promote
  • Backfill: Request → Layer-based approval → Platform assessment → Schedule → Execute → Validate → Promote

Key Rules:

  • Infrastructure changes via Terraform IaC and CI/CD only
  • Failed runs never update _LATEST.json or current/
  • Run isolation via run_id mandatory
  • Human approval required for Silver promotion and condemned data deletion
  • Quarantine rate thresholds configurable per dataset (default: 1%)
  • Schema changes versioned via schema_v for backward compatibility

See Also


See Also

© 2026 Stephen AdeiCC BY 4.0