Project Background
OpenAI is preparing to launch a new internal evaluation workflow for GPT-4.1-powered customer support analysis in the OpenAI API platform. The workflow will let enterprise customers upload labeled conversation sets, run model evaluations, and compare prompt or model variants before deployment. You are the program manager coordinating the launch across product, engineering, safety, legal, and go-to-market.
The launch is scheduled for 8 weeks from now and has executive visibility because three strategic enterprise accounts are waiting on the feature before expanding annual spend. The core team includes 10 people: 4 engineers, 1 designer, 1 product manager, 1 applied researcher, 1 safety lead, 1 legal partner, and you. Last week, a major challenge emerged: the ingestion pipeline for uploaded conversation data is failing on large files, and the safety review flagged that some evaluation outputs may expose sensitive user content in logs.
Key Stakeholders
- Head of API Product wants the feature launched this quarter to support revenue commitments.
- Safety lead wants stricter redaction and auditability, even if it delays launch.
- Engineering manager is concerned the team cannot fix both issues without cutting scope.
- Sales lead has already told two customers to expect a beta in 6 weeks.
Constraints
- Timeline: 8 weeks to GA, with a promised beta in 6 weeks
- Budget: $120,000 remaining; no additional headcount
- Engineering capacity: 4 engineers, but 1 is allocated 40% to production support
- Dependencies: safety sign-off, legal review, logging changes, and customer success enablement
Complications
- One enterprise design partner insists on supporting files up to 5 million rows at beta, while the current system is only stable at 1 million.
- Legal will not approve launch unless log retention and redaction controls are finalized by week 5.
- The PM proposes cutting comparison reporting from beta, but Sales argues that reporting is the main customer value.
Your Task
- Build an 8-week execution plan with milestones, owners, and critical dependencies.
- Recommend how to handle the scope trade-offs between file scale, reporting, and safety controls.
- Define launch criteria for beta and GA, including rollback conditions.
- Identify the top risks and mitigation plan.
- Explain how you would communicate the plan and any date or scope changes to stakeholders.