In Lyft operations workflows, raw exports from tools like Lyft Console or support intake sheets often contain the same business field in two columns with inconsistent formatting. Interviewers want to see whether you can standardize messy values before combining them into a single usable column.
Explain how you would clean and merge two columns when they represent the same information but use different formats. For example, one column may contain phone numbers with punctuation while another stores digits only, or one column may have city names with inconsistent casing and whitespace.
Address these points:
Keep the answer practical and SQL-focused. The interviewer is not looking for a full ETL architecture discussion; they want a clear PostgreSQL-based approach using common cleaning functions and simple decision logic.
Before merging two columns, convert both into the same representation. In PostgreSQL, this often means trimming whitespace, lowercasing text, removing punctuation with regular expressions, and converting empty strings to NULL so comparisons behave consistently.
NULLIF(TRIM(LOWER(city_name)), '')
Once both columns are cleaned, define a deterministic rule for which value wins. A common approach is to prefer the primary source when present, otherwise fall back to the secondary source using COALESCE, or use CASE when one source is known to be more reliable under certain conditions.
COALESCE(clean_col_a, clean_col_b) AS merged_value
If both cleaned columns are populated but still differ, do not silently overwrite without thought. Flag those rows so operations teams can review them, or apply a documented business rule for resolving conflicts.
CASE WHEN clean_col_a IS NOT NULL AND clean_col_b IS NOT NULL AND clean_col_a <> clean_col_b THEN 'conflict' ELSE 'ok' END AS merge_status
Blank strings, strings containing only spaces, and NULL values should usually be treated consistently. Using NULLIF after TRIM prevents blank values from being incorrectly chosen during a merge.
NULLIF(TRIM(col_a), '')
After building the merged column, validate row counts, null rates, and conflict counts. You should also inspect samples of transformed rows to confirm the cleaning logic did not collapse distinct values incorrectly.
SELECT COUNT(*) AS total_rows, COUNT(merged_value) AS populated_rows FROM cleaned_data;