Anthropic wants to scale a training pipeline that produces ranking and retrieval models used to improve Claude.ai response quality, prompt routing, and internal recommendation surfaces such as example prompts and tool suggestions. You are given a proposed pipeline that works today; the question is where it will break at 10x scale and how you would redesign it.
| Signal | Value |
|---|---|
| Claude.ai DAU | 25M |
| Peak inference QPS generating training events | 180K requests/sec |
| Training examples generated per day | 9B prompt/response/tool events |
| Historical training corpus retained | 2.5T events over 12 months |
| Candidate prompts/tools/docs for retrieval | 400M items |
| Peak feature store QPS | 1.2M lookups/sec |
| End-to-end online latency budget | 250ms p99 |
| Model refresh target | retrieval every 6 hours, ranker daily |
Assume the current architecture is: application logs and feedback events land in Kafka, batch ETL builds features in a warehouse, daily training jobs produce a retrieval model and a ranker, models are registered and deployed to online serving, and online predictions are logged back for future training.
Design the 10x-scale version and explain where the current design will fail first.