Evaluate Asana AI Writing Help

Context

Asana is considering broader rollout of Asana AI writing assistance inside task creation, project updates, and status summaries. The feature can rewrite text, draft summaries from task context, and suggest next steps, but leadership wants to know whether it is genuinely helping users complete work better rather than merely feeling novel.

Constraints

p95 end-to-end latency: < 1,500 ms for inline suggestions in Asana
Cost ceiling: < $0.015 per assisted interaction and < $120K/month at projected scale
Hallucination ceiling: < 2% of responses may introduce unsupported facts about task status, owners, deadlines, or dependencies
Safety: must resist prompt injection from task descriptions/comments, avoid leaking private project data across workspaces, and refuse when context is insufficient
UX constraint: no long multi-turn flow for common actions; most responses should be one-shot

Available Resources

Historical anonymized Asana interactions: accepted/rejected AI suggestions, user edits after acceptance, follow-up task changes, re-opened tasks, and manual rewrites
Workspace-scoped context: task title, description, comments, custom fields, project brief, recent status updates, and user permissions
Existing LLM providers approved by Asana, plus a smaller low-cost model for routing or judging
Human reviewers from product ops who can label a golden set for helpfulness, correctness, and safety

Task

Define an evaluation framework that determines whether Asana AI is helping users beyond novelty, including primary success metrics, guardrails, and segment-level analysis.
Design the offline evaluation plan first: golden set creation, labeling rubric, LLM-as-judge calibration, and adversarial tests for hallucination and prompt injection.
Propose the online evaluation / experiment design to measure durable user value, not just short-term engagement with AI suggestions.
Specify the prompting and serving approach for generating grounded writing assistance from Asana context, including refusal behavior when context is incomplete.
Estimate cost and latency, and explain what you would change if the feature misses either budget or quality targets.

Context

Constraints

p95 end-to-end latency: < 1,500 ms for inline suggestions in Asana
Cost ceiling: < $0.015 per assisted interaction and < $120K/month at projected scale
Hallucination ceiling: < 2% of responses may introduce unsupported facts about task status, owners, deadlines, or dependencies
Safety: must resist prompt injection from task descriptions/comments, avoid leaking private project data across workspaces, and refuse when context is insufficient
UX constraint: no long multi-turn flow for common actions; most responses should be one-shot

Available Resources

Historical anonymized Asana interactions: accepted/rejected AI suggestions, user edits after acceptance, follow-up task changes, re-opened tasks, and manual rewrites
Workspace-scoped context: task title, description, comments, custom fields, project brief, recent status updates, and user permissions
Existing LLM providers approved by Asana, plus a smaller low-cost model for routing or judging
Human reviewers from product ops who can label a golden set for helpfulness, correctness, and safety

Task

Define an evaluation framework that determines whether Asana AI is helping users beyond novelty, including primary success metrics, guardrails, and segment-level analysis.
Design the offline evaluation plan first: golden set creation, labeling rubric, LLM-as-judge calibration, and adversarial tests for hallucination and prompt injection.
Propose the online evaluation / experiment design to measure durable user value, not just short-term engagement with AI suggestions.
Specify the prompting and serving approach for generating grounded writing assistance from Asana context, including refusal behavior when context is incomplete.
Estimate cost and latency, and explain what you would change if the feature misses either budget or quality targets.

Context

Constraints

p95 end-to-end latency: < 1,500 ms for inline suggestions in Asana
Cost ceiling: < $0.015 per assisted interaction and < $120K/month at projected scale
Hallucination ceiling: < 2% of responses may introduce unsupported facts about task status, owners, deadlines, or dependencies
Safety: must resist prompt injection from task descriptions/comments, avoid leaking private project data across workspaces, and refuse when context is insufficient
UX constraint: no long multi-turn flow for common actions; most responses should be one-shot

Available Resources

Historical anonymized Asana interactions: accepted/rejected AI suggestions, user edits after acceptance, follow-up task changes, re-opened tasks, and manual rewrites
Workspace-scoped context: task title, description, comments, custom fields, project brief, recent status updates, and user permissions
Existing LLM providers approved by Asana, plus a smaller low-cost model for routing or judging
Human reviewers from product ops who can label a golden set for helpfulness, correctness, and safety

Task

Define an evaluation framework that determines whether Asana AI is helping users beyond novelty, including primary success metrics, guardrails, and segment-level analysis.
Design the offline evaluation plan first: golden set creation, labeling rubric, LLM-as-judge calibration, and adversarial tests for hallucination and prompt injection.
Propose the online evaluation / experiment design to measure durable user value, not just short-term engagement with AI suggestions.
Specify the prompting and serving approach for generating grounded writing assistance from Asana context, including refusal behavior when context is incomplete.
Estimate cost and latency, and explain what you would change if the feature misses either budget or quality targets.

Context

Constraints

p95 end-to-end latency: < 1,500 ms for inline suggestions in Asana
Cost ceiling: < $0.015 per assisted interaction and < $120K/month at projected scale
Hallucination ceiling: < 2% of responses may introduce unsupported facts about task status, owners, deadlines, or dependencies
Safety: must resist prompt injection from task descriptions/comments, avoid leaking private project data across workspaces, and refuse when context is insufficient
UX constraint: no long multi-turn flow for common actions; most responses should be one-shot

Available Resources

Historical anonymized Asana interactions: accepted/rejected AI suggestions, user edits after acceptance, follow-up task changes, re-opened tasks, and manual rewrites
Workspace-scoped context: task title, description, comments, custom fields, project brief, recent status updates, and user permissions
Existing LLM providers approved by Asana, plus a smaller low-cost model for routing or judging
Human reviewers from product ops who can label a golden set for helpfulness, correctness, and safety

Task

Define an evaluation framework that determines whether Asana AI is helping users beyond novelty, including primary success metrics, guardrails, and segment-level analysis.
Design the offline evaluation plan first: golden set creation, labeling rubric, LLM-as-judge calibration, and adversarial tests for hallucination and prompt injection.
Propose the online evaluation / experiment design to measure durable user value, not just short-term engagement with AI suggestions.
Specify the prompting and serving approach for generating grounded writing assistance from Asana context, including refusal behavior when context is incomplete.
Estimate cost and latency, and explain what you would change if the feature misses either budget or quality targets.

Interview Guides

Context

Constraints

Available Resources

Task

Evaluate Asana AI Writing Help

Context

Constraints

Available Resources

Task

Your Answer

Evaluate Asana AI Writing Help

Context

Constraints

Available Resources

Task

Evaluate Asana AI Writing Help

Context

Constraints

Available Resources

Task

Your Answer