You’re joining the Data Platform team at PayWave, a fintech that processes card payments and fraud signals for 25M monthly active cardholders across the US and EU. PayWave’s analytics and risk models depend on a lakehouse-style platform: raw and curated data lands in Amazon S3, and business-facing datasets are served from Snowflake. The company is under pressure from Finance to stop “surprise” cloud spend spikes, and from Risk/Compliance to prove that retention and deletion policies are consistently enforced.
Today, PayWave ingests data from (1) payment authorization events from microservices, (2) device telemetry from mobile SDKs, (3) chargeback/dispute updates from external networks, and (4) reference data (merchant catalog, BIN tables). The platform currently runs a mix of Kafka → Spark Structured Streaming for near-real-time ingestion and Airflow → Spark batch for daily backfills and compaction. Storage costs have grown 2.5× in the last year due to schema evolution, duplicate events, and keeping “just in case” copies in multiple layers.
Leadership asks you to create an 18-month capacity plan and propose guardrails so the platform can scale without breaking SLAs or budget. You must account for the realities of production pipelines: late-arriving data, reprocessing/backfills, compaction, multiple table layers, and compliance retention.
| Layer | Location | Format | Notes |
|---|---|---|---|
| Raw landing | S3 s3://paywave-raw/ | JSON | Immutable, partitioned by event_date |
| Bronze | S3 s3://paywave-bronze/ | Parquet (Snappy) | Deduped + schema-normalized |
| Silver | S3 s3://paywave-silver/ | Parquet | Enriched, joins to reference data |
| Gold | Snowflake | Tables | Aggregates for dashboards + ML features |
| Orchestration | Airflow 2.x | DAGs | Backfills, compaction, dbt runs |
Design a capacity planning approach and produce a plan that Finance and Engineering can sign off on.
Provide an 18-month forecast for storage growth and monthly cost drivers. At minimum, include:
Explain what you would change (if anything) to reduce growth while keeping reliability:
event_date, event_hour, merchant_id, region) and trade-offs.Propose concrete guardrails to prevent runaway storage:
Describe how retention and deletion affect capacity planning:
Define the metrics and dashboards you’d build so the plan stays accurate over time:
Your answer will be evaluated on correctness of the capacity model, realism of assumptions, and whether the plan holds up under late data, backfills, and compliance requirements.
You’re joining the Data Platform team at PayWave, a fintech that processes card payments and fraud signals for 25M monthly active cardholders across the US and EU. PayWave’s analytics and risk models depend on a lakehouse-style platform: raw and curated data lands in Amazon S3, and business-facing datasets are served from Snowflake. The company is under pressure from Finance to stop “surprise” cloud spend spikes, and from Risk/Compliance to prove that retention and deletion policies are consistently enforced.
Today, PayWave ingests data from (1) payment authorization events from microservices, (2) device telemetry from mobile SDKs, (3) chargeback/dispute updates from external networks, and (4) reference data (merchant catalog, BIN tables). The platform currently runs a mix of Kafka → Spark Structured Streaming for near-real-time ingestion and Airflow → Spark batch for daily backfills and compaction. Storage costs have grown 2.5× in the last year due to schema evolution, duplicate events, and keeping “just in case” copies in multiple layers.
Leadership asks you to create an 18-month capacity plan and propose guardrails so the platform can scale without breaking SLAs or budget. You must account for the realities of production pipelines: late-arriving data, reprocessing/backfills, compaction, multiple table layers, and compliance retention.
| Layer | Location | Format | Notes |
|---|---|---|---|
| Raw landing | S3 s3://paywave-raw/ | JSON | Immutable, partitioned by event_date |
| Bronze | S3 s3://paywave-bronze/ | Parquet (Snappy) | Deduped + schema-normalized |
| Silver | S3 s3://paywave-silver/ | Parquet | Enriched, joins to reference data |
| Gold | Snowflake | Tables | Aggregates for dashboards + ML features |
| Orchestration | Airflow 2.x | DAGs | Backfills, compaction, dbt runs |
Design a capacity planning approach and produce a plan that Finance and Engineering can sign off on.
Provide an 18-month forecast for storage growth and monthly cost drivers. At minimum, include:
Explain what you would change (if anything) to reduce growth while keeping reliability:
event_date, event_hour, merchant_id, region) and trade-offs.Propose concrete guardrails to prevent runaway storage:
Describe how retention and deletion affect capacity planning:
Define the metrics and dashboards you’d build so the plan stays accurate over time:
Your answer will be evaluated on correctness of the capacity model, realism of assumptions, and whether the plan holds up under late data, backfills, and compliance requirements.