You are the engineering manager responsible for shipping a major reliability upgrade to an AI-powered developer product before a public launch event in six weeks. The release is expected to reduce failed task runs and improve user trust, but the code path touches core orchestration, sandbox execution, and the review surface, so regressions could affect a large share of active users. Leadership is pushing for the full feature set because the launch has already been previewed to customers, while your senior staff engineer wants to delay until more soak time is complete after two recent Sev2 incidents. At the same time, one partner team owns a blocking API change on a different roadmap, and your team is carrying a backlog of quality issues that could either be fixed now or deferred behind guardrails and a phased rollout.
| Detail | Value |
|---|---|
| Deadline | 6 weeks to launch event |
| Team | 6 engineers, 1 EM, 1 product manager, shared QA support |
| Active users affected | ~35% of weekly active teams |
| Current failure rate | 4.8% of task runs on impacted workflows |
| Target failure rate | Below 2.0% within 30 days of launch |
| Recent incidents | 2 Sev2 incidents in past 5 weeks |
| Dependency | 1 external platform API change due in Week 3 |
| Rollout tolerance | No more than 1 hour rollback window |
How would you plan and execute this launch so you move quickly without taking on unacceptable quality risk? Explain how you would make scope trade-offs, manage stakeholders pushing in different directions, and decide what must be true before launch versus what can be deferred.