Diagnose Cross-System Reporting Drift

Context

FinSight Analytics provides finance and operations reporting for mid-market retailers. A client reports that revenue, order counts, and refund totals differ between Salesforce dashboards, NetSuite exports, and Snowflake-based BI reports. The current stack uses Fivetran into Snowflake, dbt for transformations, and Airflow for scheduled reconciliation jobs, but there is no consistent lineage, freshness SLA enforcement, or source-of-truth contract.

You are asked to design a pipeline and investigation approach that identifies the first places to check when reporting outputs become inconsistent across systems, and then hardens the platform to prevent recurrence.

Scale Requirements

Sources: Salesforce, NetSuite, Stripe, PostgreSQL app DB, S3 CSV drops
Volume: 120M fact rows/day, 2 TB/day raw ingest
Latency: Finance dashboards must be consistent within 30 minutes of source updates
Historical retention: 3 years online in Snowflake, 7 years archived in S3
Reconciliation scope: 250 client-facing metrics across 40 marts

Requirements

Design an ingestion and transformation pipeline that can isolate where metric drift begins: source extraction, load, transformation, or serving.
Define the first checks you would perform when a client reports inconsistency, including freshness, schema drift, duplicate loads, join cardinality changes, and business logic divergence.
Build reconciliation tables that compare record counts, checksums, and metric aggregates across systems by business date and source batch.
Ensure transformations are idempotent and support backfills for corrected source data.
Add data quality gates before publishing marts used by Looker and client exports.
Provide monitoring, alerting, and failure recovery for delayed loads, broken models, and mismatched KPI thresholds.

Constraints

Must stay on AWS + Snowflake; no full platform rewrite
Incremental budget limit: $18K/month
SOX-sensitive finance data requires auditability and reproducible metric definitions
Source APIs have rate limits and occasional late-arriving corrections
Team size: 3 data engineers, 1 analytics engineer

Context

Scale Requirements

Sources: Salesforce, NetSuite, Stripe, PostgreSQL app DB, S3 CSV drops
Volume: 120M fact rows/day, 2 TB/day raw ingest
Latency: Finance dashboards must be consistent within 30 minutes of source updates
Historical retention: 3 years online in Snowflake, 7 years archived in S3
Reconciliation scope: 250 client-facing metrics across 40 marts

Requirements

Design an ingestion and transformation pipeline that can isolate where metric drift begins: source extraction, load, transformation, or serving.
Define the first checks you would perform when a client reports inconsistency, including freshness, schema drift, duplicate loads, join cardinality changes, and business logic divergence.
Build reconciliation tables that compare record counts, checksums, and metric aggregates across systems by business date and source batch.
Ensure transformations are idempotent and support backfills for corrected source data.
Add data quality gates before publishing marts used by Looker and client exports.
Provide monitoring, alerting, and failure recovery for delayed loads, broken models, and mismatched KPI thresholds.

Constraints

Must stay on AWS + Snowflake; no full platform rewrite
Incremental budget limit: $18K/month
SOX-sensitive finance data requires auditability and reproducible metric definitions
Source APIs have rate limits and occasional late-arriving corrections
Team size: 3 data engineers, 1 analytics engineer

Context

Scale Requirements

Sources: Salesforce, NetSuite, Stripe, PostgreSQL app DB, S3 CSV drops
Volume: 120M fact rows/day, 2 TB/day raw ingest
Latency: Finance dashboards must be consistent within 30 minutes of source updates
Historical retention: 3 years online in Snowflake, 7 years archived in S3
Reconciliation scope: 250 client-facing metrics across 40 marts

Requirements

Design an ingestion and transformation pipeline that can isolate where metric drift begins: source extraction, load, transformation, or serving.
Define the first checks you would perform when a client reports inconsistency, including freshness, schema drift, duplicate loads, join cardinality changes, and business logic divergence.
Build reconciliation tables that compare record counts, checksums, and metric aggregates across systems by business date and source batch.
Ensure transformations are idempotent and support backfills for corrected source data.
Add data quality gates before publishing marts used by Looker and client exports.
Provide monitoring, alerting, and failure recovery for delayed loads, broken models, and mismatched KPI thresholds.

Constraints

Must stay on AWS + Snowflake; no full platform rewrite
Incremental budget limit: $18K/month
SOX-sensitive finance data requires auditability and reproducible metric definitions
Source APIs have rate limits and occasional late-arriving corrections
Team size: 3 data engineers, 1 analytics engineer

Context

Scale Requirements

Sources: Salesforce, NetSuite, Stripe, PostgreSQL app DB, S3 CSV drops
Volume: 120M fact rows/day, 2 TB/day raw ingest
Latency: Finance dashboards must be consistent within 30 minutes of source updates
Historical retention: 3 years online in Snowflake, 7 years archived in S3
Reconciliation scope: 250 client-facing metrics across 40 marts

Requirements

Design an ingestion and transformation pipeline that can isolate where metric drift begins: source extraction, load, transformation, or serving.
Define the first checks you would perform when a client reports inconsistency, including freshness, schema drift, duplicate loads, join cardinality changes, and business logic divergence.
Build reconciliation tables that compare record counts, checksums, and metric aggregates across systems by business date and source batch.
Ensure transformations are idempotent and support backfills for corrected source data.
Add data quality gates before publishing marts used by Looker and client exports.
Provide monitoring, alerting, and failure recovery for delayed loads, broken models, and mismatched KPI thresholds.

Constraints

Must stay on AWS + Snowflake; no full platform rewrite
Incremental budget limit: $18K/month
SOX-sensitive finance data requires auditability and reproducible metric definitions
Source APIs have rate limits and occasional late-arriving corrections
Team size: 3 data engineers, 1 analytics engineer

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Diagnose Cross-System Reporting Drift

Context

Scale Requirements

Requirements

Constraints

Your Answer

Diagnose Cross-System Reporting Drift

Context

Scale Requirements

Requirements

Constraints

Diagnose Cross-System Reporting Drift

Context

Scale Requirements

Requirements

Constraints

Your Answer