Interview Guides

Simulate Capacity Planning for a Multi-Tenant Inference Service

Easy

Coding

A customer wants to onboard several business units onto a shared inference platform and asks whether the current deployment can handle forecasted traffic next quarter. Write code that takes per-tenant traffic forecasts (requests per minute, average input/output tokens, peak multiplier, and target utilization) plus deployment characteristics (tokens/sec per replica, max replicas, warmup time, and cost per replica-hour), and outputs: required replicas at peak, monthly cost estimate, and a flag indicating whether the plan is feasible under the current limits. Then generate a concise executive-ready summary sentence that explains the result in plain language for a VP who is not deeply technical. Expected solution outline: convert RPM and token sizes into peak tokens/sec demand per tenant; aggregate across tenants; divide by effective per-replica throughput adjusted for target utilization to compute required replicas; compare against max replicas and include warmup considerations as a risk note; estimate monthly cost from replica-hours; produce both structured output and a short non-technical explanation emphasizing capacity headroom, cost, and any scaling bottleneck.

Simulate Capacity Planning for a Multi-Tenant Inference Service

Easy

Coding

A customer wants to onboard several business units onto a shared inference platform and asks whether the current deployment can handle forecasted traffic next quarter. Write code that takes per-tenant traffic forecasts (requests per minute, average input/output tokens, peak multiplier, and target utilization) plus deployment characteristics (tokens/sec per replica, max replicas, warmup time, and cost per replica-hour), and outputs: required replicas at peak, monthly cost estimate, and a flag indicating whether the plan is feasible under the current limits. Then generate a concise executive-ready summary sentence that explains the result in plain language for a VP who is not deeply technical. Expected solution outline: convert RPM and token sizes into peak tokens/sec demand per tenant; aggregate across tenants; divide by effective per-replica throughput adjusted for target utilization to compute required replicas; compare against max replicas and include warmup considerations as a risk note; estimate monthly cost from replica-hours; produce both structured output and a short non-technical explanation emphasizing capacity headroom, cost, and any scaling bottleneck.

Your Answer

Simulate Capacity Planning for a Multi-Tenant Inference Service | Dataford Interview Questions - Dataford - Ace your Interview