
At companies like Northstar Analytics, analysts often need to combine a current client list with historical revenue records to build reports or validate coverage. Interviewers use this question to check whether you understand how to match records correctly and preserve missing cases.
Explain how you would use SQL lookup logic to merge a client list with historical revenue data. In your answer, discuss:
LEFT JOIN versus INNER JOINThe interviewer is looking for a practical SQL explanation, not spreadsheet-specific lookup functions. You should describe the join strategy, null handling, and aggregation choices clearly, and mention how your approach changes depending on whether the goal is full client coverage or only matched revenue records.
A LEFT JOIN keeps every row from the client list and brings in matching revenue rows when they exist. This is the safest default when the client list is the primary reporting population and some clients may have no historical revenue.
SELECT c.client_id, c.client_name, r.revenue_date, r.amount
FROM clients c
LEFT JOIN revenue_history r
ON c.client_id = r.client_id;
An INNER JOIN returns only clients that have at least one matching revenue record. This is useful when the analysis should exclude clients with no revenue history, but it is the wrong choice if you need a complete client roster.
SELECT c.client_id, c.client_name, r.amount
FROM clients c
INNER JOIN revenue_history r
ON c.client_id = r.client_id;
When a client has no matching revenue row in a LEFT JOIN, the revenue columns are NULL. You often use COALESCE to convert missing aggregated revenue to 0 so reports are easier to read and downstream calculations behave predictably.
SELECT c.client_id, COALESCE(SUM(r.amount), 0) AS total_revenue
FROM clients c
LEFT JOIN revenue_history r
ON c.client_id = r.client_id
GROUP BY c.client_id;
Historical revenue usually contains multiple rows per client, so you often need to aggregate after the join. Grouping by client lets you calculate totals, averages, or latest revenue periods without losing the client-level view.
SELECT c.client_id, c.client_name, SUM(r.amount) AS total_revenue
FROM clients c
LEFT JOIN revenue_history r
ON c.client_id = r.client_id
GROUP BY c.client_id, c.client_name;
If either table has duplicate client identifiers or inconsistent keys, the join can multiply rows and inflate revenue totals. A strong answer should mention validating uniqueness, cleaning keys, or pre-aggregating revenue before joining when needed.
SELECT client_id, COUNT(*)
FROM clients
GROUP BY client_id
HAVING COUNT(*) > 1;