Selected work
A set of production-style ingestion pipelines for federal and state transportation data, built to move large files from S3 through validation, transformation, archive handling, and Snowflake loading.
Beyond the scorecard app itself, this work includes multiple ETL pipelines for state crash data, federal inspection feeds, and federal crash datasets, all built around repeatable ingestion flows, error handling, logging, and operational recovery.
What this involved
The state pipeline processes S3-hosted ZIP archives, extracts and validates crash files, detects duplicates against Snowflake using composite-key checks, and keeps file/folder tracking tables so runs can resume safely after failures.
The broader codebase also includes federal inspection and federal crash ingestion paths, with archive-bucket handling, temp-directory processing, spreadsheet and tabular file support, and summarized failure logging for operators.
This is the kind of work that matters when data pipelines need to be durable in the real world: structured logs, operational restart behavior, schema-aware imports, notification hooks, and deployment patterns that can run on a managed server instead of only in a notebook.
Capabilities shown
Technology stack