Project Background
You are the program manager for Tasker, a two-sided marketplace in the gig economy (similar to TaskRabbit) with 8.5M monthly active users across the US, Canada, UK, and Australia. Tasker’s brand differentiator is its “Tasker voice”: clear, friendly, non-judgmental, and action-oriented language that reduces anxiety for customers booking help and makes taskers (service providers) feel respected. Historically, Tasker has invested heavily in human-written UX copy and a strict editorial style guide.
The company is now launching an LLM-powered in-app Support Assistant that can answer FAQs, guide users through cancellations/refunds, and help taskers resolve payout issues. The assistant will be embedded in the mobile app and web, and it will also draft responses for human agents in Zendesk. Leadership expects this to reduce support costs by 15% and improve first-contact resolution.
However, early prototypes from engineering are “technically correct but off-brand”: overly formal, occasionally blamey (“You failed to…”) and inconsistent in tone across flows. Engineering argues that tightening tone constraints will reduce answer quality and increase latency/cost. Brand and Support leadership argue that if the assistant doesn’t sound like Tasker, it will erode trust and increase escalations.
You have 8 weeks to launch a public beta to 10% of US traffic before the peak summer season. The cross-functional team includes 6 backend engineers, 2 mobile engineers, 1 web engineer, 1 ML engineer, 1 designer, 1 UX writer, 1 data analyst, and 1 support operations lead. You also depend on a centralized Trust & Safety team for policy review and a Legal team for disclaimers.
Stakeholder Landscape (Competing Priorities)
- Engineering (Platform + ML) wants to ship quickly using a standard prompt + retrieval approach, optimize for answer correctness and latency, and avoid building “custom tone infrastructure.” They are concerned about scope creep and on-call burden.
- Brand/UX Writing insists that “Tasker voice” must be treated as a product requirement, not a polish item. They want a consistent tone system, examples, and approvals before launch.
- Support Operations wants measurable deflection and fewer tickets, but worries that a wrong tone in sensitive flows (refunds, safety incidents) will spike escalations and CSAT drops.
- Trust & Safety cares about policy compliance (no advice that could cause harm, no victim-blaming language, correct escalation to human agents).
- Finance is tracking LLM inference costs; they have set a hard ceiling on per-conversation cost.
Your core execution challenge: How do you ensure the “Tasker voice” is considered in technical decisions (model choice, prompting strategy, evaluation, tooling, launch gates), not just in final copy review?
Constraints
- Timeline: 8 weeks to public beta; 2 additional weeks to expand to 50% if metrics are healthy.
- Budget: LLM + infra budget capped at $120K/month for beta. Target <$0.04 per conversation at p95.
- Latency: p95 end-to-end response time must be <1.8s in-app.
- Safety: Any conversation containing keywords related to harassment, injury, discrimination, or payment fraud must route to a human agent within 60 seconds.
- Localization: Beta is US-English only, but the architecture must not block future UK/AU English variants.
- Data: You have 18 months of support tickets and macros, but only ~30% are cleanly labeled by issue type.
Deliverables (What you must produce)
- A decision-making framework that makes “Tasker voice” a first-class input to technical choices (not subjective, not purely brand-driven).
- A roadmap and execution plan for 8 weeks, including cross-functional rituals, review gates, and ownership.
- A trade-off proposal for at least two technical approaches (e.g., prompt-only vs. prompt + tone classifier vs. fine-tuning), including cost/latency implications.
- A launch plan with phased rollout, monitoring, and rollback criteria—explicitly tying voice quality to go/no-go.
- A risk register covering brand risk, safety risk, and engineering risk, with mitigations and triggers.
Complications (Realistic Curveballs)
- Week 3: The UX writer is pulled into a CEO-level rebrand initiative for 2 weeks and has limited availability. Brand still expects “voice compliance.”
- Week 5: Trust & Safety flags that the assistant sometimes responds to payout disputes with language that implies fault (“You didn’t set up your bank correctly”), which is unacceptable and could create PR risk.
- Week 6: Finance announces a surprise cost spike from LLM usage in another product area and demands a 20% reduction in your monthly inference budget before beta.
Interview Prompt
Walk me through how you would execute this launch so that “Tasker voice” materially influences technical decisions (architecture, evaluation, and launch gates). Be specific about:
- How you translate “voice” into measurable requirements and acceptance criteria.
- How you structure decision-making so engineering can move fast without bypassing brand/safety.
- What you would do when voice quality conflicts with latency/cost/correctness.
- How you would handle the complications above without missing the beta date.