You are supporting a BI platform for a subscription business where executives, analysts, and customer teams rely on shared KPI dashboards. Over the last year, metric definitions such as active customer, delinquency rate, and conversion have changed multiple times, and the same dashboard now shows different values depending on when and where it is queried. An internal audit found that historical charts were silently recalculated under new logic, breaking trust in reporting. You need a pipeline design that lets visualizations stay consistent with approved business definitions while still allowing metric logic to evolve.
| Component | Status / Technology |
|---|---|
| Source systems | PostgreSQL operational DBs, Salesforce, event logs in Amazon S3 |
| Ingestion | Fivetran into Snowflake raw schema |
| Transformations | dbt models with shared macros, no semantic versioning |
| Orchestration | Apache Airflow 2.x scheduled hourly and daily |
| BI serving | Curated Snowflake marts consumed by dashboards |
| Governance | Metric definitions tracked in Confluence and spreadsheets |
Scale: ~2.5TB raw data/day, 40+ source tables, 180 curated metrics, 600 dashboard tiles, hourly freshness target for operational KPIs, 3 years of historical restatement risk.
How would you redesign the pipeline so metric definition changes are versioned, testable, and auditable, while ensuring dashboards continue to render historically correct values and new definitions can be rolled out safely?