MediCart, a mid-sized e-commerce marketplace, loads order, payment, inventory, and shipment data from PostgreSQL, Stripe webhooks, and CSV files from 40 logistics partners into Snowflake. The current Airflow-based nightly ETL frequently produces incomplete fact tables because upstream files arrive late, webhook events are duplicated, and some source fields are null or missing.
The analytics team needs a redesigned pipeline that can detect missing or incomplete data early, prevent bad loads from reaching reporting tables, and support safe backfills without double-counting revenue or orders.
MediCart, a mid-sized e-commerce marketplace, loads order, payment, inventory, and shipment data from PostgreSQL, Stripe webhooks, and CSV files from 40 logistics partners into Snowflake. The current Airflow-based nightly ETL frequently produces incomplete fact tables because upstream files arrive late, webhook events are duplicated, and some source fields are null or missing.
The analytics team needs a redesigned pipeline that can detect missing or incomplete data early, prevent bad loads from reaching reporting tables, and support safe backfills without double-counting revenue or orders.