





Cross-system metric mismatches are common in analytics environments, especially when reporting in platforms such as AbbVie customer insight dashboards depends on multiple upstream data sources. Interviewers want to see whether you can move from “the numbers do not match” to a structured SQL-based diagnosis.
Describe how you would identify the root cause of a data discrepancy between two systems that should report the same customer metric. Explain how you would use SQL to compare row counts, key coverage, duplicates, null handling, transformation logic, and timing differences. You should also discuss how you would narrow the issue from aggregate mismatch to specific records and how you would validate whether the discrepancy comes from joins, filters, late-arriving data, or business-rule differences.
Answer at the level of a senior analyst: focus on a practical, repeatable debugging workflow, the SQL patterns you would use, and how you would communicate findings once you isolate the issue.
Start by confirming the discrepancy at the highest useful level, such as by date, channel, or customer segment. This tells you whether the issue is global or isolated and helps you reduce the search space before comparing individual rows.
SELECT report_date, COUNT(*) AS row_count, SUM(metric_value) AS total_value
FROM system_a_metrics
GROUP BY report_date;
A FULL OUTER JOIN on the business key is often the fastest way to find records present in one system but not the other. This distinguishes coverage issues from value mismatches and immediately surfaces whether the discrepancy is due to dropped or duplicated records.
SELECT COALESCE(a.customer_id, b.customer_id) AS customer_id,
a.metric_value AS a_value,
b.metric_value AS b_value
FROM system_a_metrics a
FULL OUTER JOIN system_b_metrics b
ON a.customer_id = b.customer_id
WHERE a.customer_id IS NULL
OR b.customer_id IS NULL
OR a.metric_value <> b.metric_value;
Two systems can disagree even when they contain the same keys if one table is stored at a finer grain or a join multiplies rows. Counting records per business key and comparing expected uniqueness is critical before trusting any aggregate comparison.
SELECT customer_id, COUNT(*) AS records_per_key
FROM system_b_metrics
GROUP BY customer_id
HAVING COUNT(*) > 1;
Many discrepancies come from inconsistent CASE logic, date filters, status filters, or null handling. Rebuilding the metric in SQL from raw inputs for both systems helps isolate whether the issue is in source data or in business-rule implementation.
SELECT customer_id,
CASE WHEN status = 'active' AND amount > 0 THEN amount ELSE 0 END AS normalized_value
FROM raw_events;
Late-arriving data, multiple updates per key, and snapshot timing often create mismatches between operational and reporting systems. CTEs and window functions such as ROW_NUMBER() help you compare the latest version of each record at a consistent cutoff time.
WITH ranked AS (
SELECT customer_id, updated_at, metric_value,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY updated_at DESC) AS rn
FROM system_a_history
)
SELECT customer_id, metric_value
FROM ranked
WHERE rn = 1;