Context
You’re working on the analytics platform for a large e-commerce marketplace (~10M orders/day, millions of monthly active buyers). Multiple teams consume curated datasets: Finance needs revenue reporting that must not undercount orders; Growth needs funnel metrics that must include users who didn’t convert; and Trust & Safety needs to find sellers with missing compliance documents. In this environment, choosing the wrong join type can silently drop rows, leading to incorrect KPIs, misallocated marketing spend, or even regulatory reporting errors.
Core Question
Explain the difference between an INNER JOIN and a LEFT JOIN in SQL, and when you would use each.
In your answer, address:
- Result set semantics: Which rows are kept vs removed when there is no match on the join key?
- NULL behavior: What values appear for columns from the right-hand table when there is no match?
- Business scenarios: Give at least two realistic examples from analytics/engineering work:
- One where INNER JOIN is the correct choice (e.g., only orders with valid payments).
- One where LEFT JOIN is the correct choice (e.g., all users with optional orders).
- Common pitfall: Explain how a
WHERE clause on the right table can accidentally turn a LEFT JOIN into an INNER JOIN, and how to avoid it.
- Validation approach: Describe how you would sanity-check row counts to ensure you didn’t unintentionally drop data.
Scope Guidance (What a strong interview answer includes)
- Use precise language: “preserves all rows from the left table” vs “filters to matches only.”
- Include short SQL snippets demonstrating both joins.
- Mention how join choice impacts downstream aggregations (conversion rate, revenue, “missing” entity detection).
- Briefly touch on performance only if relevant (e.g., join order, filtering early), but focus primarily on correctness.