You are launching a new multi-tenant application and need a database design that also supports reliable analytical pipelines from day one. Product and finance teams already expect near-real-time reporting on tenant activity, billing events, and operational SLAs, but leadership has escalated concerns after prior launches produced mismatched dashboard numbers and brittle backfills. You need an architecture that treats the application database, change capture, transformations, and serving models as one scalable pipeline rather than a standalone OLTP system. The goal is to support fast transactional workloads while making trusted data available in Snowflake with low latency and clear lineage.
| Component | Status / Technology |
|---|---|
| Application database | New PostgreSQL 15 deployment planned for primary writes |
| Event capture | No CDC yet; application emits limited webhook logs |
| Ingestion to warehouse | Nightly CSV exports loaded manually |
| Warehouse | Snowflake with a few ad hoc reporting tables |
| Transformations | SQL scripts, no versioned ELT framework |
| Orchestration | Basic cron jobs; no dependency management |
| Scale: 25K TPS peak on the app database, 150M row inserts/day, 20M row updates/day, 5 TB first-year operational data, 10-minute freshness target for core metrics, and occasional 30-day historical backfills. |
How would you design the end-to-end database and data pipeline architecture so the application can scale operationally while Snowflake remains the trusted analytical system for incremental loads, schema evolution, data quality, and replayable backfills?