Dataford
Interview Guides
Upgrade
All questions/Generative AI & LLMs/Design Async LLM Job Delivery

Design Async LLM Job Delivery

Hard
Generative AI & LLMs
RAGLLM AgentsStructured Extraction

Problem

Context

BrightInbox is adding an AI writing assistant that generates long-form email replies, account summaries, and follow-up drafts. Some requests take several seconds and may require retrieval or tool calls, so the product team wants an asynchronous pipeline that reliably returns results to users without duplicate jobs, lost outputs, or unsafe content.

Constraints

  • API acknowledgment to client: p95 < 300ms
  • End-to-end job completion: p95 < 12s, p99 < 30s
  • Cost ceiling: <$9 per 1,000 completed jobs
  • Hallucination rate on grounded tasks: <2% on a labeled offline set
  • Prompt injection success rate from retrieved content or tool output: <0.5%
  • At-least-once queue delivery is acceptable, but user-visible duplicate outputs are not
  • Users must see accurate job state: queued, running, succeeded, failed, expired

Available Resources

  • 2M historical support emails and CRM notes, with document-level ACLs
  • Existing Postgres, Redis, object storage, and a managed queue
  • Approved models: GPT-4.1-mini for orchestration, GPT-4.1 for high-risk generations
  • Optional retrieval over CRM notes and help-center articles
  • Web app, mobile app, and webhook callback support for result delivery

Task

  1. Design the end-to-end asynchronous architecture: request intake, idempotent job creation, queueing, worker execution, persistence, and result delivery back to users.
  2. Specify how you would structure prompts and outputs so workers can safely produce typed results, citations, status, and refusal reasons when needed.
  3. Define an eval-first plan covering offline quality/safety evaluation and online reliability/product metrics before finalizing architecture.
  4. Explain how you would handle retries, partial failures, duplicate messages, timeouts, cancellation, and replay/backfill without surfacing inconsistent states.
  5. Estimate cost and latency, and describe the main tradeoffs between model quality, throughput, and reliability safeguards.

Problem

Context

BrightInbox is adding an AI writing assistant that generates long-form email replies, account summaries, and follow-up drafts. Some requests take several seconds and may require retrieval or tool calls, so the product team wants an asynchronous pipeline that reliably returns results to users without duplicate jobs, lost outputs, or unsafe content.

Constraints

  • API acknowledgment to client: p95 < 300ms
  • End-to-end job completion: p95 < 12s, p99 < 30s
  • Cost ceiling: <$9 per 1,000 completed jobs
  • Hallucination rate on grounded tasks: <2% on a labeled offline set
  • Prompt injection success rate from retrieved content or tool output: <0.5%
  • At-least-once queue delivery is acceptable, but user-visible duplicate outputs are not
  • Users must see accurate job state: queued, running, succeeded, failed, expired

Available Resources

  • 2M historical support emails and CRM notes, with document-level ACLs
  • Existing Postgres, Redis, object storage, and a managed queue
  • Approved models: GPT-4.1-mini for orchestration, GPT-4.1 for high-risk generations
  • Optional retrieval over CRM notes and help-center articles
  • Web app, mobile app, and webhook callback support for result delivery

Task

  1. Design the end-to-end asynchronous architecture: request intake, idempotent job creation, queueing, worker execution, persistence, and result delivery back to users.
  2. Specify how you would structure prompts and outputs so workers can safely produce typed results, citations, status, and refusal reasons when needed.
  3. Define an eval-first plan covering offline quality/safety evaluation and online reliability/product metrics before finalizing architecture.
  4. Explain how you would handle retries, partial failures, duplicate messages, timeouts, cancellation, and replay/backfill without surfacing inconsistent states.
  5. Estimate cost and latency, and describe the main tradeoffs between model quality, throughput, and reliability safeguards.
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Design Async LLM Job PlatformHardADesign Customer Support AI AssistantEasyEvaluate AI Support Workflow ImpactHard
Next question