Standardize API Prompt Outputs

Context

BrightDesk sells an LLM-powered customer-support drafting API to enterprise teams. A new customer wants the model to produce highly consistent outputs across agents, channels, and repeated runs, without building a full RAG system yet.

Constraints

p95 latency: 1,200ms per request
Cost ceiling: $8 per 1,000 requests
Output consistency target: at least 90% schema-valid responses on a 300-prompt golden set
Hallucination ceiling: fewer than 2% unsupported policy claims in offline evaluation
Safety: must resist prompt injection attempts in user input and avoid leaking hidden instructions
The customer may later localize prompts into 3 languages, so the design should be maintainable

Available Resources

Historical dataset of 20,000 support prompts and agent-written ideal responses
A 300-example golden set labeled for format adherence, factuality, tone, and refusal correctness
Two approved hosted models: a lower-cost fast model and a higher-quality mid-tier model
Existing API gateway that can enforce JSON schema validation and retries
No retrieval layer for this phase; answers must rely only on provided business rules and user input

Task

Design a prompt engineering approach that maximizes output consistency, including prompt structure, delimiters, examples, and structured output requirements.
Define an evaluation plan before rollout, including offline tests for consistency, hallucination, and prompt injection resistance, plus online monitoring after launch.
Propose the serving architecture, including model choice, fallback/retry behavior, schema validation, and versioning strategy for prompts.
Estimate cost and latency at 200,000 requests per month, and explain what tradeoffs you would make if the team must reduce cost by 40%.
Identify the main failure modes for inconsistent outputs and explain how you would detect and mitigate them in production.

Constraints

p95 latency: 1,200ms per request

Cost ceiling: $8 per 1,000 requests

Output consistency target: at least 90% schema-valid responses on a 300-prompt golden set

Hallucination ceiling: fewer than 2% unsupported policy claims in offline evaluation

Safety: must resist prompt injection attempts in user input and avoid leaking hidden instructions

The customer may later localize prompts into 3 languages, so the design should be maintainable

Available Resources

Historical dataset of 20,000 support prompts and agent-written ideal responses

A 300-example golden set labeled for format adherence, factuality, tone, and refusal correctness

Two approved hosted models: a lower-cost fast model and a higher-quality mid-tier model

Existing API gateway that can enforce JSON schema validation and retries

No retrieval layer for this phase; answers must rely only on provided business rules and user input

Task

Design a prompt engineering approach that maximizes output consistency, including prompt structure, delimiters, examples, and structured output requirements.

Define an evaluation plan before rollout, including offline tests for consistency, hallucination, and prompt injection resistance, plus online monitoring after launch.

Propose the serving architecture, including model choice, fallback/retry behavior, schema validation, and versioning strategy for prompts.

Estimate cost and latency at 200,000 requests per month, and explain what tradeoffs you would make if the team must reduce cost by 40%.

Identify the main failure modes for inconsistent outputs and explain how you would detect and mitigate them in production.

Constraints

p95 latency: 1,200ms per request

Cost ceiling: $8 per 1,000 requests

Output consistency target: at least 90% schema-valid responses on a 300-prompt golden set

Hallucination ceiling: fewer than 2% unsupported policy claims in offline evaluation

Safety: must resist prompt injection attempts in user input and avoid leaking hidden instructions

The customer may later localize prompts into 3 languages, so the design should be maintainable

Available Resources

Historical dataset of 20,000 support prompts and agent-written ideal responses

A 300-example golden set labeled for format adherence, factuality, tone, and refusal correctness

Two approved hosted models: a lower-cost fast model and a higher-quality mid-tier model

Existing API gateway that can enforce JSON schema validation and retries

No retrieval layer for this phase; answers must rely only on provided business rules and user input

Task

Design a prompt engineering approach that maximizes output consistency, including prompt structure, delimiters, examples, and structured output requirements.

Define an evaluation plan before rollout, including offline tests for consistency, hallucination, and prompt injection resistance, plus online monitoring after launch.

Propose the serving architecture, including model choice, fallback/retry behavior, schema validation, and versioning strategy for prompts.

Estimate cost and latency at 200,000 requests per month, and explain what tradeoffs you would make if the team must reduce cost by 40%.

Identify the main failure modes for inconsistent outputs and explain how you would detect and mitigate them in production.

Constraints

p95 latency: 1,200ms per request

Cost ceiling: $8 per 1,000 requests

Output consistency target: at least 90% schema-valid responses on a 300-prompt golden set

Hallucination ceiling: fewer than 2% unsupported policy claims in offline evaluation

Safety: must resist prompt injection attempts in user input and avoid leaking hidden instructions

The customer may later localize prompts into 3 languages, so the design should be maintainable

Available Resources

Historical dataset of 20,000 support prompts and agent-written ideal responses

A 300-example golden set labeled for format adherence, factuality, tone, and refusal correctness

Two approved hosted models: a lower-cost fast model and a higher-quality mid-tier model

Existing API gateway that can enforce JSON schema validation and retries

No retrieval layer for this phase; answers must rely only on provided business rules and user input

Task

Design a prompt engineering approach that maximizes output consistency, including prompt structure, delimiters, examples, and structured output requirements.

Define an evaluation plan before rollout, including offline tests for consistency, hallucination, and prompt injection resistance, plus online monitoring after launch.

Propose the serving architecture, including model choice, fallback/retry behavior, schema validation, and versioning strategy for prompts.

Estimate cost and latency at 200,000 requests per month, and explain what tradeoffs you would make if the team must reduce cost by 40%.

Identify the main failure modes for inconsistent outputs and explain how you would detect and mitigate them in production.

Interview Guides

Context

Constraints

Available Resources

Task

Standardize API Prompt Outputs

Context

Constraints

Available Resources

Task

Your Answer

Standardize API Prompt Outputs

Context

Constraints

Available Resources

Task

Standardize API Prompt Outputs

Context

Constraints

Available Resources

Task

Your Answer