You are the engineering manager responsible for launching a new request-routing and throttling layer for a public API platform built on Amazon API Gateway and AWS Lambda. The change is meant to reduce p99 latency and improve tenant-level isolation before the peak holiday traffic window, but it touches shared infrastructure used by hundreds of internal and external clients across three AWS Regions. The launch is tricky because one senior engineer who owns the current routing logic is leaving in three weeks, the security team requires updated AWS WAF rules and sign-off before production exposure, and a large enterprise customer has already been promised improved rate-limit controls by the end of the quarter. Leadership wants a visible launch date, but the operations team is concerned that observability, rollback automation, and on-call readiness are not yet at production standard.
| Detail | Value |
|---|---|
| Engineers on project | 6 backend, 2 SRE, 1 QA |
| AWS Regions | 3 |
| API traffic | 1.8B requests/day |
| Client tenants affected | 420 |
| Launch deadline | 10 weeks |
| Peak traffic event | Starts 2 weeks after target launch |
| Availability requirement | 99.95% |
| Error budget for launch week | <0.1% incremental 5xx |
| Security dependency | AWS WAF review and approval |
How would you make sure the team is operationally ready for this launch and execute it in a way that balances the deadline, customer commitments, and reliability risk? Walk through how you would structure the plan, prepare for failure modes, and decide whether to proceed, phase, or delay the launch.