Context
In a fintech payments company processing 50M+ card transactions/day across multiple processors, you’re migrating from a legacy settlement system to a new warehouse model used for regulatory reporting (SOX), dispute operations, and revenue recognition. The migration requires dozens of field-level conversions: currency normalization, status code standardization, timestamp timezone rules, and PII handling. A single undocumented assumption (e.g., how to map a processor-specific decline code) can cause downstream metric drift, audit findings, or incorrect customer notifications.
Core question
How do you document conversion requirements and mapping rules for a SQL-based ETL/ELT pipeline so that engineers, analysts, and auditors can understand and validate the transformations?
Address the following:
- Requirements capture: What information do you collect from stakeholders (finance, risk, ops) and from the source systems?
- Source-to-target mapping spec: What columns/fields should a mapping document include (e.g., source field, target field, transformation logic, defaulting, null rules, data types, units, timezone, rounding, code mappings)?
- Rule expression: How do you represent rules that are conditional (e.g.,
CASE WHEN) or depend on multiple fields/joins?
- Edge cases & exceptions: How do you document unknown codes, late-arriving data, duplicate events, and conflicting sources of truth?
- Validation & sign-off: What data quality checks, reconciliation queries, and acceptance criteria do you attach to each mapping?
- Operationalization: How do you keep documentation in sync with SQL code (versioning, lineage, change management), and how do you make it discoverable?
Scope guidance (what the interviewer expects)
- A practical, repeatable approach (templates, data contracts, mapping tables, dbt-style docs, or a metadata catalog)
- Concrete examples of a mapping rule (e.g., status code normalization) and how it would be documented
- Trade-offs between “docs in a wiki” vs “docs as data” (tables + tests + generated docs)
- How your approach reduces ambiguity, supports audits, and prevents silent metric regressions