You are responsible for the data pipeline that powers customer-facing operational dashboards for a utility analytics product. The current workflow refreshes visualizations after upstream meter, outage, and billing data lands, but users regularly see mismatched totals between dashboards and exported reports after reruns or late-arriving corrections. An internal audit has flagged the process as non-repeatable because the same reporting date can produce different results depending on job timing and partial failures. You need to redesign the workflow so dashboard updates are reliable, repeatable, and easy to recover without manual intervention.
| Component | Status / Technology |
|---|---|
| Source systems | AMI meter reads, outage events, billing extracts, asset reference data |
| Landing zone | Azure Data Lake Storage Gen2, mixed CSV/JSON/Parquet drops |
| Transformations | Azure Databricks notebooks with ad hoc merge logic |
| Warehouse | Snowflake serving curated reporting tables |
| Semantic layer | Itron Analytics dashboards reading Snowflake views |
| Orchestration | Apache Airflow 2.x with daily and hourly DAGs |
| Scale: ~1.2 TB/day across 4 source domains, 180M meter-read records/day, hourly outage updates, daily billing corrections, 15-minute dashboard freshness target, 2 years of retained history, and recurring backfills for corrected utility data. |
How would you redesign this pipeline so each dashboard refresh is deterministic, idempotent, and auditable even when upstream files arrive late, are corrected, or need to be replayed? Explain the end-to-end workflow you would build from ingestion through transformation, validation, publication, and rollback for the visualization layer.