You’ve joined the data platform team at a large omnichannel retailer (web + 2,000 physical stores) processing ~10M order line items/day. The company is migrating from ad-hoc operational reporting to a governed analytics warehouse that powers executive dashboards (gross margin, promo performance), machine learning features (demand forecasting), and self-serve BI for hundreds of analysts.
A recurring design decision is whether to model the warehouse using a star schema (denormalized dimensions around a central fact table) or a snowflake schema (normalized dimensions broken into multiple related tables). The choice affects query simplicity, storage, data quality, and performance—especially for join-heavy BI queries and for maintaining slowly changing dimensions.
Explain the key differences between star schema and snowflake schema in the context of an analytics warehouse.
In your answer, cover:
Discuss trade-offs rather than claiming one is always superior. Use a realistic example (e.g., fact_sales joined to dim_product, dim_store, dim_date) and explain how the model affects a common metric query like “weekly revenue by brand and region.” Mention how the decision changes with scale (billions of fact rows), team maturity, and downstream tools (semantic layers, dbt models, BI extract engines).