Schema Design for Historical Reporting

Context

You’re interviewing with the Risk & Identity org at Stripe, supporting a new feature called Merchant Risk Controls. Merchants can configure rules (e.g., “block payments from high-risk countries”, “require 3DS for cards above $X”, “allowlist customer IDs”) and Stripe applies these rules in real time during authorization.

The product team now needs historical reporting for merchants and internal auditors: “What rules were active on March 3rd at 10:15 UTC?”, “How many payments were declined due to rule version V?”, and “Show a month-over-month trend of declines by rule category.” This is not just analytics—Stripe must support auditability for disputes and regulatory inquiries. The current system stores only the latest rule configuration per merchant in a transactional DB and emits payment decision events to Kafka.

Today’s data platform is AWS-based: Kafka → S3 data lake (raw) → Snowflake for analytics. Transformations are primarily dbt orchestrated by Airflow. The team has had incidents where late-arriving events and replays caused double-counting, so the new design must be idempotent and support backfills.

Scale Requirements

Merchants: ~3M active merchants
Payments: average 25K events/sec, peak 120K events/sec (global traffic spikes)
Rule changes: average 2 changes/day/merchant for the top 50K merchants; long tail is sparse
Latency:
- Payment decision reporting freshness: < 10 minutes end-to-end
- Rule configuration history availability: < 30 minutes
Retention:
- Raw immutable events: 7 years (audit requirement)
- Curated analytics tables: 2+ years online, older data can be archived but must be recoverable

Data Characteristics

Source systems

Rules Service (Postgres)
- Stores current rule configuration per merchant.
- Emits change events (CDC) when rules are created/updated/deleted.
Payment Decision Stream (Kafka)
- One event per authorization attempt with decision outcome and reason codes.

Common issues you must handle

Out-of-order events: CDC events can arrive late relative to payment decisions.
Replays: Kafka topics may be replayed for incident recovery.
Schema evolution: rule definitions change (new fields, renamed fields, new rule types).
PII: some rule conditions may include customer identifiers; strict access controls apply.

Example payloads

Rule change event (CDC-like):

field	example
merchant_id	"m_123"
rule_id	"r_abc"
event_type	"UPSERT"
rule_payload	JSON blob (conditions, thresholds, metadata)
valid_from_ts	"2026-01-10T12:00:00Z" (business effective time)
emitted_ts	"2026-01-10T12:00:03Z"
version	17

Payment decision event:

field	example
payment_id	"pi_789"
merchant_id	"m_123"
auth_ts	"2026-01-10T12:00:02Z"
decision	"DECLINE"
decline_reason	"RISK_RULE"
rule_id_applied	"r_abc" (nullable)
rule_version_applied	17 (nullable)
event_id	UUID

Your Task

Design a schema + ELT pipeline that enables accurate point-in-time historical reporting of rule configurations and their impact on payment decisions.

Functional requirements

Historical rule state: Support queries like “show the full rule configuration as-of timestamp T” for any merchant.
Point-in-time attribution: For each payment decision, attribute the decision to the correct rule version that was effective at auth_ts (not arrival time).
Idempotent loads: Reprocessing the same Kafka ranges or re-running Airflow DAGs must not duplicate facts.
Late-arriving data: Correctly handle late CDC events (up to 7 days late) and late payment events (up to 24 hours late).
Backfills: Support backfilling a full month of history (e.g., after a bug fix) without taking the warehouse down.
Schema evolution: Allow new rule types/fields without breaking downstream models; maintain audit trail.

Non-functional requirements

Auditability: Changes must be traceable (who/what/when), and raw events must be immutable.
Performance: Common merchant-level reports should run in < 5 seconds on Snowflake for a 30-day window.
Security & compliance: PII fields must be protected; enforce least privilege and support data deletion where applicable.
Operational excellence: Monitoring, alerting, and runbooks for data delays, quality regressions, and cost spikes.

Constraints

Must use existing stack: Kafka, S3, Snowflake, dbt, Airflow.
No new paid vendor tools this quarter (open-source ok).
Team is 5 data engineers; on-call load is high, so the design should minimize operational complexity.
Some merchants are on EU residency: raw and curated data must be stored/processed in-region (assume separate Snowflake accounts or databases).

Interview Prompts (what you should cover)

What tables would you create (raw, staging, curated)? What are the keys?
Would you model rule history as SCD Type 2, event-sourcing, or both? Why?
How do you join payments to the correct rule version with late/out-of-order events?
How do you implement incremental processing and backfills in dbt/Airflow?
What data quality checks are mandatory (and where do they run)?
How do you optimize for Snowflake query performance and cost?

Context

Scale Requirements

Merchants: ~3M active merchants
Payments: average 25K events/sec, peak 120K events/sec (global traffic spikes)
Rule changes: average 2 changes/day/merchant for the top 50K merchants; long tail is sparse
Latency:
- Payment decision reporting freshness: < 10 minutes end-to-end
- Rule configuration history availability: < 30 minutes
Retention:
- Raw immutable events: 7 years (audit requirement)
- Curated analytics tables: 2+ years online, older data can be archived but must be recoverable

Data Characteristics

Source systems

Rules Service (Postgres)
- Stores current rule configuration per merchant.
- Emits change events (CDC) when rules are created/updated/deleted.
Payment Decision Stream (Kafka)
- One event per authorization attempt with decision outcome and reason codes.

Common issues you must handle

Out-of-order events: CDC events can arrive late relative to payment decisions.
Replays: Kafka topics may be replayed for incident recovery.
Schema evolution: rule definitions change (new fields, renamed fields, new rule types).
PII: some rule conditions may include customer identifiers; strict access controls apply.

Example payloads

Rule change event (CDC-like):

field	example
merchant_id	"m_123"
rule_id	"r_abc"
event_type	"UPSERT"
rule_payload	JSON blob (conditions, thresholds, metadata)
valid_from_ts	"2026-01-10T12:00:00Z" (business effective time)
emitted_ts	"2026-01-10T12:00:03Z"
version	17

Payment decision event:

field	example
payment_id	"pi_789"
merchant_id	"m_123"
auth_ts	"2026-01-10T12:00:02Z"
decision	"DECLINE"
decline_reason	"RISK_RULE"
rule_id_applied	"r_abc" (nullable)
rule_version_applied	17 (nullable)
event_id	UUID

Your Task

Design a schema + ELT pipeline that enables accurate point-in-time historical reporting of rule configurations and their impact on payment decisions.

Functional requirements

Historical rule state: Support queries like “show the full rule configuration as-of timestamp T” for any merchant.
Point-in-time attribution: For each payment decision, attribute the decision to the correct rule version that was effective at auth_ts (not arrival time).
Idempotent loads: Reprocessing the same Kafka ranges or re-running Airflow DAGs must not duplicate facts.
Late-arriving data: Correctly handle late CDC events (up to 7 days late) and late payment events (up to 24 hours late).
Backfills: Support backfilling a full month of history (e.g., after a bug fix) without taking the warehouse down.
Schema evolution: Allow new rule types/fields without breaking downstream models; maintain audit trail.

Non-functional requirements

Auditability: Changes must be traceable (who/what/when), and raw events must be immutable.
Performance: Common merchant-level reports should run in < 5 seconds on Snowflake for a 30-day window.
Security & compliance: PII fields must be protected; enforce least privilege and support data deletion where applicable.
Operational excellence: Monitoring, alerting, and runbooks for data delays, quality regressions, and cost spikes.

Constraints

Must use existing stack: Kafka, S3, Snowflake, dbt, Airflow.
No new paid vendor tools this quarter (open-source ok).
Team is 5 data engineers; on-call load is high, so the design should minimize operational complexity.
Some merchants are on EU residency: raw and curated data must be stored/processed in-region (assume separate Snowflake accounts or databases).

Interview Prompts (what you should cover)

What tables would you create (raw, staging, curated)? What are the keys?
Would you model rule history as SCD Type 2, event-sourcing, or both? Why?
How do you join payments to the correct rule version with late/out-of-order events?
How do you implement incremental processing and backfills in dbt/Airflow?
What data quality checks are mandatory (and where do they run)?
How do you optimize for Snowflake query performance and cost?

Context

Scale Requirements

Merchants: ~3M active merchants
Payments: average 25K events/sec, peak 120K events/sec (global traffic spikes)
Rule changes: average 2 changes/day/merchant for the top 50K merchants; long tail is sparse
Latency:
- Payment decision reporting freshness: < 10 minutes end-to-end
- Rule configuration history availability: < 30 minutes
Retention:
- Raw immutable events: 7 years (audit requirement)
- Curated analytics tables: 2+ years online, older data can be archived but must be recoverable

Data Characteristics

Source systems

Rules Service (Postgres)
- Stores current rule configuration per merchant.
- Emits change events (CDC) when rules are created/updated/deleted.
Payment Decision Stream (Kafka)
- One event per authorization attempt with decision outcome and reason codes.

Common issues you must handle

Out-of-order events: CDC events can arrive late relative to payment decisions.
Replays: Kafka topics may be replayed for incident recovery.
Schema evolution: rule definitions change (new fields, renamed fields, new rule types).
PII: some rule conditions may include customer identifiers; strict access controls apply.

Example payloads

Rule change event (CDC-like):

field	example
merchant_id	"m_123"
rule_id	"r_abc"
event_type	"UPSERT"
rule_payload	JSON blob (conditions, thresholds, metadata)
valid_from_ts	"2026-01-10T12:00:00Z" (business effective time)
emitted_ts	"2026-01-10T12:00:03Z"
version	17

Payment decision event:

field	example
payment_id	"pi_789"
merchant_id	"m_123"
auth_ts	"2026-01-10T12:00:02Z"
decision	"DECLINE"
decline_reason	"RISK_RULE"
rule_id_applied	"r_abc" (nullable)
rule_version_applied	17 (nullable)
event_id	UUID

Your Task

Design a schema + ELT pipeline that enables accurate point-in-time historical reporting of rule configurations and their impact on payment decisions.

Functional requirements

Historical rule state: Support queries like “show the full rule configuration as-of timestamp T” for any merchant.
Point-in-time attribution: For each payment decision, attribute the decision to the correct rule version that was effective at auth_ts (not arrival time).
Idempotent loads: Reprocessing the same Kafka ranges or re-running Airflow DAGs must not duplicate facts.
Late-arriving data: Correctly handle late CDC events (up to 7 days late) and late payment events (up to 24 hours late).
Backfills: Support backfilling a full month of history (e.g., after a bug fix) without taking the warehouse down.
Schema evolution: Allow new rule types/fields without breaking downstream models; maintain audit trail.

Non-functional requirements

Auditability: Changes must be traceable (who/what/when), and raw events must be immutable.
Performance: Common merchant-level reports should run in < 5 seconds on Snowflake for a 30-day window.
Security & compliance: PII fields must be protected; enforce least privilege and support data deletion where applicable.
Operational excellence: Monitoring, alerting, and runbooks for data delays, quality regressions, and cost spikes.

Constraints

Must use existing stack: Kafka, S3, Snowflake, dbt, Airflow.
No new paid vendor tools this quarter (open-source ok).
Team is 5 data engineers; on-call load is high, so the design should minimize operational complexity.
Some merchants are on EU residency: raw and curated data must be stored/processed in-region (assume separate Snowflake accounts or databases).

Interview Prompts (what you should cover)

What tables would you create (raw, staging, curated)? What are the keys?
Would you model rule history as SCD Type 2, event-sourcing, or both? Why?
How do you join payments to the correct rule version with late/out-of-order events?
How do you implement incremental processing and backfills in dbt/Airflow?
What data quality checks are mandatory (and where do they run)?
How do you optimize for Snowflake query performance and cost?

Context

Scale Requirements

Merchants: ~3M active merchants
Payments: average 25K events/sec, peak 120K events/sec (global traffic spikes)
Rule changes: average 2 changes/day/merchant for the top 50K merchants; long tail is sparse
Latency:
- Payment decision reporting freshness: < 10 minutes end-to-end
- Rule configuration history availability: < 30 minutes
Retention:
- Raw immutable events: 7 years (audit requirement)
- Curated analytics tables: 2+ years online, older data can be archived but must be recoverable

Data Characteristics

Source systems

Rules Service (Postgres)
- Stores current rule configuration per merchant.
- Emits change events (CDC) when rules are created/updated/deleted.
Payment Decision Stream (Kafka)
- One event per authorization attempt with decision outcome and reason codes.

Common issues you must handle

Out-of-order events: CDC events can arrive late relative to payment decisions.
Replays: Kafka topics may be replayed for incident recovery.
Schema evolution: rule definitions change (new fields, renamed fields, new rule types).
PII: some rule conditions may include customer identifiers; strict access controls apply.

Example payloads

Rule change event (CDC-like):

field	example
merchant_id	"m_123"
rule_id	"r_abc"
event_type	"UPSERT"
rule_payload	JSON blob (conditions, thresholds, metadata)
valid_from_ts	"2026-01-10T12:00:00Z" (business effective time)
emitted_ts	"2026-01-10T12:00:03Z"
version	17

Payment decision event:

field	example
payment_id	"pi_789"
merchant_id	"m_123"
auth_ts	"2026-01-10T12:00:02Z"
decision	"DECLINE"
decline_reason	"RISK_RULE"
rule_id_applied	"r_abc" (nullable)
rule_version_applied	17 (nullable)
event_id	UUID

Your Task

Design a schema + ELT pipeline that enables accurate point-in-time historical reporting of rule configurations and their impact on payment decisions.

Functional requirements

Historical rule state: Support queries like “show the full rule configuration as-of timestamp T” for any merchant.
Point-in-time attribution: For each payment decision, attribute the decision to the correct rule version that was effective at auth_ts (not arrival time).
Idempotent loads: Reprocessing the same Kafka ranges or re-running Airflow DAGs must not duplicate facts.
Late-arriving data: Correctly handle late CDC events (up to 7 days late) and late payment events (up to 24 hours late).
Backfills: Support backfilling a full month of history (e.g., after a bug fix) without taking the warehouse down.
Schema evolution: Allow new rule types/fields without breaking downstream models; maintain audit trail.

Non-functional requirements

Auditability: Changes must be traceable (who/what/when), and raw events must be immutable.
Performance: Common merchant-level reports should run in < 5 seconds on Snowflake for a 30-day window.
Security & compliance: PII fields must be protected; enforce least privilege and support data deletion where applicable.
Operational excellence: Monitoring, alerting, and runbooks for data delays, quality regressions, and cost spikes.

Constraints

Must use existing stack: Kafka, S3, Snowflake, dbt, Airflow.
No new paid vendor tools this quarter (open-source ok).
Team is 5 data engineers; on-call load is high, so the design should minimize operational complexity.
Some merchants are on EU residency: raw and curated data must be stored/processed in-region (assume separate Snowflake accounts or databases).

Interview Prompts (what you should cover)

What tables would you create (raw, staging, curated)? What are the keys?
Would you model rule history as SCD Type 2, event-sourcing, or both? Why?
How do you join payments to the correct rule version with late/out-of-order events?
How do you implement incremental processing and backfills in dbt/Airflow?
What data quality checks are mandatory (and where do they run)?
How do you optimize for Snowflake query performance and cost?

Interview Guides

Context

Scale Requirements

Data Characteristics

Source systems

Common issues you must handle

Example payloads

Your Task

Functional requirements

Non-functional requirements

Constraints

Interview Prompts (what you should cover)

Schema Design for Historical Reporting

Context

Scale Requirements

Data Characteristics

Source systems

Common issues you must handle

Example payloads

Your Task

Functional requirements

Non-functional requirements

Constraints

Interview Prompts (what you should cover)

Your Answer

Schema Design for Historical Reporting

Context

Scale Requirements

Data Characteristics

Source systems

Common issues you must handle

Example payloads

Your Task

Functional requirements

Non-functional requirements

Constraints

Interview Prompts (what you should cover)

Schema Design for Historical Reporting

Context

Scale Requirements

Data Characteristics

Source systems

Common issues you must handle

Example payloads

Your Task

Functional requirements

Non-functional requirements

Constraints

Interview Prompts (what you should cover)

Your Answer