Design a low-latency fraud scoring platform on Databricks under a strict cloud budget

A fintech customer wants to serve real-time card-transaction fraud scores using Databricks. They expect 2,500 QPS sustained, 8,000 QPS peak during business hours, P95 end-to-end latency under 120 ms, model AUC of at least 0.94, and a monthly platform budget capped at $180k across compute and storage. Ask the candidate to design the serving, feature, training, and monitoring architecture using Databricks-native components where possible, and to explain tradeoffs between batch, streaming, and online feature access, including what should run on SQL warehouses, Model Serving, Delta/Unity Catalog, and external systems if needed. The candidate should also discuss how they would tune performance, control cost, and handle model rollback, drift detection, and multi-region resiliency.

Interview Guides

Design a low-latency fraud scoring platform on Databricks under a strict cloud budget