Project Context
OpenAI wants to launch a GPT-powered support triage workflow for ChatGPT Enterprise customers. The feature will use OpenAI models to classify inbound support tickets by urgency, policy sensitivity, and likely routing team, reducing manual triage time for high-volume accounts. You are the AI/ML analyst supporting the launch across product, policy, operations, and engineering.
The pilot team includes 10 people: 3 ML engineers, 2 product managers, 2 policy specialists, 2 support operations leads, and you. Leadership wants a production pilot in 8 weeks because enterprise support costs rose 18% QoQ, but the Policy team is concerned the model may under-prioritize safety-related tickets or route sensitive content incorrectly.
Key Stakeholders
- Head of Enterprise Support wants faster triage and lower handling costs.
- Policy Lead wants strong protections against harmful or unfair routing decisions.
- Engineering Manager wants to avoid a brittle launch that creates manual rework.
- Sales Lead wants the pilot live for the top 25 ChatGPT Enterprise accounts before quarter-end.
Constraints
- Budget: $180,000 remaining for the pilot
- Timeline: 8 weeks to launch
- Scope: top 25 enterprise accounts, about 40,000 tickets/month
- Team capacity: no additional headcount
- Dependency: policy review must approve the risk framework before external rollout
Complications
- Historical ticket labels are inconsistent across support teams, with an estimated 14% labeling disagreement on sensitive cases.
- Early testing shows the model improves average triage speed by 32%, but recall on safety-sensitive tickets is only 91% versus the Policy team’s 97% target.
- A large enterprise customer has asked for inclusion in the pilot, increasing commercial pressure to launch on time.
Deliverables
- Build an execution plan for the 8-week pilot, including milestones, owners, and dependencies.
- Explain how you would incorporate ethical considerations into the project’s risk evaluation and launch criteria.
- Recommend launch trade-offs if speed, model quality, and policy requirements conflict.
- Define success metrics and guardrails for the first 30 days post-launch.
- Propose a rollback or containment plan if the system mishandles sensitive tickets.