You are modernizing the analytics pipeline for a B2B SaaS platform whose finance, sales, and product teams rely on conflicting reports from different operational systems. The current warehouse exists, but it is treated as a passive storage layer rather than the governed source of truth for analytics. An executive escalation was triggered after monthly revenue and active-customer numbers differed across dashboards and board materials. You need to redesign the pipeline so the data warehouse becomes the trusted foundation for batch analytics, metric definitions, and downstream reporting.
| Component | Status / Technology |
|---|---|
| Source systems | PostgreSQL OLTP, Salesforce, Stripe, application event logs in cloud object storage |
| Ingestion | Nightly extracts and ad hoc CSV uploads |
| Warehouse | Snowflake with raw tables but limited transformation standards |
| Transformations | Mixed SQL scripts and spreadsheets |
| Orchestration | Cron jobs with manual reruns |
| BI serving | Dashboards query inconsistent tables directly |
Scale: ~250 GB/day of new raw data, 1.2 TB historical warehouse footprint, 40 source tables from PostgreSQL, 15 Salesforce objects, hourly freshness target for core business metrics, and 99.9% successful daily pipeline completion.
How would you redesign this Snowflake-based warehousing pipeline so it reliably ingests, models, and serves trusted analytics data, and why does the warehouse design matter for consistency, performance, and decision-making as the business scales?