Context
Northstar Health sells a SaaS platform for clinic operations and currently lands application data from PostgreSQL, Stripe, and Salesforce into Snowflake using nightly batch loads. The warehouse has grown organically with inconsistent table definitions, duplicated business logic, and slow downstream reporting, so the data team needs a clearer modeling approach as part of a modern ELT pipeline redesign.
You are asked to design the target data model and supporting pipeline for analytics use cases such as revenue reporting, appointment utilization, provider performance, and customer lifecycle analysis.
Scale Requirements
- Sources: PostgreSQL OLTP (~1,200 tables), Stripe API, Salesforce, S3 CSV partner feeds
- Volume: ~250M fact rows added per month, 4 TB compressed warehouse storage today
- Batch cadence: core models every 30 minutes; finance models hourly; full backfills on demand
- Latency target: source change to analytics-ready tables in < 45 minutes
- Consumers: 120 BI users, 25 scheduled dashboards, 15 downstream data products
Requirements
- Design a warehouse data model that separates raw, staging, and business-ready layers.
- Explain when you would use star schema, wide tables, or normalized intermediate models.
- Define how dimensions and facts should be modeled for appointments, invoices, payments, clinics, providers, and patients.
- Support slowly changing attributes for provider specialty, clinic region, and account owner changes.
- Ensure transformations are idempotent and can be rerun safely for partial failures or backfills.
- Include data quality checks for primary key uniqueness, referential integrity, null rates, and freshness.
- Describe orchestration, dependency management, and how model rebuilds are triggered after source loads.
- Provide a strategy for documenting lineage and making model semantics understandable to analysts.
Constraints
- Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
- Team size is 3 data engineers and 2 analytics engineers.
- HIPAA-sensitive fields must be masked or excluded from analytics models.
- Budget allows moderate Snowflake scaling, but full-table rebuilds every run are not acceptable.