


A
Analysts often use Excel VLOOKUP to pull columns from one dataset into another using a shared key. In SQL, the equivalent operation is typically done with a join, and interviewers want to know both the syntax and the reasoning.
Explain how you would merge two datasets that share a common unique identifier using SQL instead of VLOOKUP.
Address these points:
INNER JOIN instead of LEFT JOINThe interviewer expects a practical explanation of how VLOOKUP maps to SQL joins, not a deep discussion of database internals. A strong answer should compare VLOOKUP behavior with LEFT JOIN and INNER JOIN, mention null handling, and include a simple PostgreSQL example.
A typical VLOOKUP keeps all rows from the main dataset and pulls matching columns from a second dataset. In SQL, LEFT JOIN does the same by returning every row from the left table and matching data from the right table when available.
SELECT a.customer_id, a.order_total, b.customer_name
FROM orders a
LEFT JOIN customers b
ON a.customer_id = b.customer_id;
The join condition connects rows from two tables using the common unique identifier, such as customer_id or employee_id. If the identifier is truly unique in the lookup table, each row from the main table maps to at most one matching row.
ON a.customer_id = b.customer_id
If you only want records that exist in both datasets, use INNER JOIN. This differs from VLOOKUP-style enrichment because unmatched rows from the main table are removed entirely instead of being retained with nulls.
SELECT a.customer_id, b.customer_name
FROM orders a
INNER JOIN customers b
ON a.customer_id = b.customer_id;
When a LEFT JOIN cannot find a matching row in the lookup table, the joined columns return NULL. This is similar to a failed VLOOKUP, except SQL usually leaves the result as NULL unless you explicitly replace it with COALESCE.
SELECT a.customer_id,
COALESCE(b.customer_name, 'Unknown') AS customer_name
FROM orders a
LEFT JOIN customers b
ON a.customer_id = b.customer_id;
A common mistake is assuming the lookup key is unique when it is not. In SQL, if the right table contains multiple rows for the same key, the join returns multiple matches and can unexpectedly multiply rows in the result.
SELECT a.customer_id, b.status
FROM orders a
LEFT JOIN customer_status_history b
ON a.customer_id = b.customer_id;