Interview Guides

Model Analytics Warehouse for ELT

Easy

Pipelines

Context

Northstar Health sells a SaaS platform for clinic operations and currently lands application data from PostgreSQL, Stripe, and Salesforce into Snowflake using nightly batch loads. The warehouse has grown organically with inconsistent table definitions, duplicated business logic, and slow downstream reporting, so the data team needs a clearer modeling approach as part of a modern ELT pipeline redesign.

You are asked to design the target data model and supporting pipeline for analytics use cases such as revenue reporting, appointment utilization, provider performance, and customer lifecycle analysis.

Scale Requirements

Sources: PostgreSQL OLTP (~1,200 tables), Stripe API, Salesforce, S3 CSV partner feeds
Volume: ~250M fact rows added per month, 4 TB compressed warehouse storage today
Batch cadence: core models every 30 minutes; finance models hourly; full backfills on demand
Latency target: source change to analytics-ready tables in < 45 minutes
Consumers: 120 BI users, 25 scheduled dashboards, 15 downstream data products

Requirements

Design a warehouse data model that separates raw, staging, and business-ready layers.
Explain when you would use star schema, wide tables, or normalized intermediate models.
Define how dimensions and facts should be modeled for appointments, invoices, payments, clinics, providers, and patients.
Support slowly changing attributes for provider specialty, clinic region, and account owner changes.
Ensure transformations are idempotent and can be rerun safely for partial failures or backfills.
Include data quality checks for primary key uniqueness, referential integrity, null rates, and freshness.
Describe orchestration, dependency management, and how model rebuilds are triggered after source loads.
Provide a strategy for documenting lineage and making model semantics understandable to analysts.

Constraints

Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
Team size is 3 data engineers and 2 analytics engineers.
HIPAA-sensitive fields must be masked or excluded from analytics models.
Budget allows moderate Snowflake scaling, but full-table rebuilds every run are not acceptable.

Model Analytics Warehouse for ELT

Easy

Pipelines

Context

Scale Requirements

Sources: PostgreSQL OLTP (~1,200 tables), Stripe API, Salesforce, S3 CSV partner feeds
Volume: ~250M fact rows added per month, 4 TB compressed warehouse storage today
Batch cadence: core models every 30 minutes; finance models hourly; full backfills on demand
Latency target: source change to analytics-ready tables in < 45 minutes
Consumers: 120 BI users, 25 scheduled dashboards, 15 downstream data products

Requirements

Design a warehouse data model that separates raw, staging, and business-ready layers.
Explain when you would use star schema, wide tables, or normalized intermediate models.
Define how dimensions and facts should be modeled for appointments, invoices, payments, clinics, providers, and patients.
Support slowly changing attributes for provider specialty, clinic region, and account owner changes.
Ensure transformations are idempotent and can be rerun safely for partial failures or backfills.
Include data quality checks for primary key uniqueness, referential integrity, null rates, and freshness.
Describe orchestration, dependency management, and how model rebuilds are triggered after source loads.
Provide a strategy for documenting lineage and making model semantics understandable to analysts.

Constraints

Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
Team size is 3 data engineers and 2 analytics engineers.
HIPAA-sensitive fields must be masked or excluded from analytics models.
Budget allows moderate Snowflake scaling, but full-table rebuilds every run are not acceptable.

Your Answer

Model Analytics Warehouse for ELT

Easy

Pipelines

Context

Scale Requirements

Sources: PostgreSQL OLTP (~1,200 tables), Stripe API, Salesforce, S3 CSV partner feeds
Volume: ~250M fact rows added per month, 4 TB compressed warehouse storage today
Batch cadence: core models every 30 minutes; finance models hourly; full backfills on demand
Latency target: source change to analytics-ready tables in < 45 minutes
Consumers: 120 BI users, 25 scheduled dashboards, 15 downstream data products

Requirements

Design a warehouse data model that separates raw, staging, and business-ready layers.
Explain when you would use star schema, wide tables, or normalized intermediate models.
Define how dimensions and facts should be modeled for appointments, invoices, payments, clinics, providers, and patients.
Support slowly changing attributes for provider specialty, clinic region, and account owner changes.
Ensure transformations are idempotent and can be rerun safely for partial failures or backfills.
Include data quality checks for primary key uniqueness, referential integrity, null rates, and freshness.
Describe orchestration, dependency management, and how model rebuilds are triggered after source loads.
Provide a strategy for documenting lineage and making model semantics understandable to analysts.

Constraints

Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
Team size is 3 data engineers and 2 analytics engineers.
HIPAA-sensitive fields must be masked or excluded from analytics models.
Budget allows moderate Snowflake scaling, but full-table rebuilds every run are not acceptable.

Model Analytics Warehouse for ELT

Easy

Pipelines

Context

Scale Requirements

Sources: PostgreSQL OLTP (~1,200 tables), Stripe API, Salesforce, S3 CSV partner feeds
Volume: ~250M fact rows added per month, 4 TB compressed warehouse storage today
Batch cadence: core models every 30 minutes; finance models hourly; full backfills on demand
Latency target: source change to analytics-ready tables in < 45 minutes
Consumers: 120 BI users, 25 scheduled dashboards, 15 downstream data products

Requirements

Design a warehouse data model that separates raw, staging, and business-ready layers.
Explain when you would use star schema, wide tables, or normalized intermediate models.
Define how dimensions and facts should be modeled for appointments, invoices, payments, clinics, providers, and patients.
Support slowly changing attributes for provider specialty, clinic region, and account owner changes.
Ensure transformations are idempotent and can be rerun safely for partial failures or backfills.
Include data quality checks for primary key uniqueness, referential integrity, null rates, and freshness.
Describe orchestration, dependency management, and how model rebuilds are triggered after source loads.
Provide a strategy for documenting lineage and making model semantics understandable to analysts.

Constraints

Existing stack is AWS + Snowflake; avoid introducing more than one major new platform.
Team size is 3 data engineers and 2 analytics engineers.
HIPAA-sensitive fields must be masked or excluded from analytics models.
Budget allows moderate Snowflake scaling, but full-table rebuilds every run are not acceptable.