Context
FinSight, a B2B SaaS company, needs a reliable pipeline to ingest customer billing and subscription data from a third-party REST API into Snowflake for finance and product reporting. Today, analysts manually export JSON files once per day, which causes stale metrics, missing records, and no audit trail.
You are asked to design a production ETL pipeline that extracts data from the vendor API, lands raw data, transforms it into analytics-ready tables, and loads it into the warehouse with strong observability and recovery.
Scale Requirements
- Source API volume: 8 endpoints, ~12M records/day total
- API limits: 1,000 requests/minute, cursor-based pagination, occasional 429s
- Data size: ~150 GB/day raw JSON, 2-year historical backfill
- Latency target: New data available in Snowflake within 30 minutes of source update
- Warehouse size: 4 TB active analytics data, 7-year retention for finance tables
Requirements
- Build an incremental ETL process from the REST API into Snowflake using a raw landing layer and transformed warehouse tables.
- Support full historical backfills and daily incremental loads using
updated_at or cursor-based extraction.
- Handle API pagination, retries, rate limiting, and authentication token refresh automatically.
- Preserve raw payloads for replay and audit, while also producing normalized tables for subscriptions, invoices, customers, and payments.
- Ensure idempotent loads so reruns do not create duplicates.
- Add data quality checks for null primary keys, duplicate business keys, schema drift, and row-count anomalies.
- Orchestrate dependencies, retries, and alerting for scheduled runs.
Constraints
- Existing stack is AWS-based; prefer managed services over self-hosted infrastructure.
- Team size is 3 data engineers; operational simplicity matters more than perfect real-time design.
- Finance data is SOX-relevant, so lineage, auditability, and reproducibility are required.
- Vendor API occasionally changes optional fields without notice; the pipeline must tolerate additive schema changes.