
At Uber, Operations Managers often need to evaluate supply performance across cities by combining trip activity with driver and city attributes. This requires joining a fact table with dimension tables in a way that preserves the right level of detail.
Explain how you would join trips, drivers, and cities tables to analyze supply performance in Uber. In your answer, discuss:
The interviewer is looking for a practical SQL-oriented explanation, not just definitions of joins. You should describe the grain of each table, how to structure the query, and how to produce city-level supply metrics such as active drivers, completed trips, and utilization.
In this setup, trips is typically the fact table because it stores event-level ride records, while drivers and cities are dimension tables that provide descriptive attributes. Understanding table grain is critical because aggregations should be built from the fact table and enriched with dimensions only after confirming the join keys.
SELECT t.trip_id, d.driver_id, c.city_name
FROM trips t
JOIN drivers d ON t.driver_id = d.driver_id
JOIN cities c ON t.city_id = c.city_id;
An INNER JOIN keeps only rows with matches in both tables, while a LEFT JOIN preserves all rows from the left table. For supply analysis, you often start from trips if you only care about trip activity, but start from drivers with LEFT JOIN if you need to include drivers who had zero trips.
SELECT d.driver_id, c.city_name, t.trip_id
FROM drivers d
LEFT JOIN trips t ON d.driver_id = t.driver_id
LEFT JOIN cities c ON d.city_id = c.city_id;
Supply metrics should be computed at a clear reporting grain, such as city-day or city-week. If you aggregate after joining tables with mismatched grain, you can inflate counts, so it is important to group by the reporting dimensions and use distinct counts when needed.
SELECT c.city_name, COUNT(DISTINCT t.driver_id) AS active_drivers, COUNT(*) AS trips
FROM trips t
JOIN cities c ON t.city_id = c.city_id
GROUP BY c.city_name;
Many operational metrics require counting subsets of rows, such as completed trips, canceled trips, or online drivers. CASE WHEN inside aggregate functions lets you compute multiple KPIs in one grouped query.
SELECT c.city_name,
COUNT(*) AS total_trips,
SUM(CASE WHEN t.status = 'completed' THEN 1 ELSE 0 END) AS completed_trips
FROM trips t
JOIN cities c ON t.city_id = c.city_id
GROUP BY c.city_name;
Double counting happens when a join multiplies rows, usually because one side is not unique at the join key. Before joining, confirm whether drivers has one row per driver and cities has one row per city, and use COUNT(DISTINCT ...) when the metric is entity-based rather than row-based.
SELECT c.city_name, COUNT(DISTINCT d.driver_id) AS drivers_seen
FROM trips t
JOIN drivers d ON t.driver_id = d.driver_id
JOIN cities c ON t.city_id = c.city_id
GROUP BY c.city_name;