Context
You’re joining the data platform team at a global e-commerce marketplace processing 10M+ orders/day across web and mobile. Finance and Sales Ops rely on a curated sales reporting layer for daily revenue, discounts, returns, and margin reporting by product category, region, and sales channel. The current operational schema is highly normalized and optimized for writes, but analysts are struggling with slow queries and inconsistent definitions (e.g., “net sales” vs “gross sales”).
Core Question
Design a data model (tables + keys + relationships) for a sales reporting system that supports common analytics queries such as:
- Daily/weekly/monthly revenue and units sold
- Revenue by product category, brand, and region
- Sales by channel (web/app/partner), campaign, and sales rep
- Net sales after discounts, refunds, and returns
In your answer, explain:
- Which tables you would create (fact vs dimension) and what each table represents.
- Primary keys and foreign keys for each table, including whether you’d use surrogate keys vs natural keys.
- The grain of your fact table(s) (e.g., order-level vs line-item-level) and why.
- How you would model slowly changing attributes (e.g., customer region changes, product category reclassification).
- How you would handle returns/refunds and discount allocation (line-level vs order-level) so that finance metrics reconcile.
Scope Guidance (what a strong answer includes)
- A clear statement of the grain and how it prevents double counting in joins/aggregations.
- A star schema (or snowflake where justified) with explicit 1-to-many and many-to-1 relationships.
- Concrete key choices (surrogate integer keys, unique constraints on natural identifiers, composite keys where appropriate).
- Practical considerations: late-arriving dimensions, deduplication, and how the model supports performant joins for BI tools.