Optimize Enterprise LLM Prompts

Business Context

Northstar Systems uses an internal LLM assistant for HR, IT, and finance workflows such as ticket routing, policy Q&A, and email drafting. Prompt quality is inconsistent across teams, so you need to design and evaluate a prompt optimization pipeline for task-specific enterprise use cases.

Data

You have 180,000 historical prompt-response pairs collected from internal usage logs across 12 tasks. Inputs range from 20-1,200 words, with a median of 180 words. Text is primarily English, but about 9% includes mixed-language content, copied email threads, bullet lists, tables, and internal acronyms. Human preference labels are available for 35,000 examples, with pairwise rankings and task-specific quality annotations such as factuality, format compliance, and actionability. Label quality is uneven across teams.

Success Criteria

A good solution improves task success rate by at least 12% over the current baseline prompts, reduces invalid or off-format outputs below 3%, and keeps median inference latency under 2 seconds. The system should generalize across tasks without requiring full model fine-tuning for every workflow.

Constraints

Prompts must not expose confidential data beyond the minimum required context
The solution must run in a private cloud environment
Prompt templates must be versioned, testable, and easy for non-ML teams to update
Token budget is limited to 4,096 tokens per request

Requirements

Define a prompt optimization approach for multiple enterprise tasks
Build a preprocessing pipeline for noisy internal text and task metadata
Implement a modern Python workflow for prompt generation, scoring, and evaluation
Explain how you would compare prompt variants using offline and online metrics
Describe failure modes such as hallucination, prompt injection, and formatting drift
Recommend how to operationalize prompt versioning, rollback, and monitoring

Business Context

Data

Success Criteria

Constraints

Prompts must not expose confidential data beyond the minimum required context
The solution must run in a private cloud environment
Prompt templates must be versioned, testable, and easy for non-ML teams to update
Token budget is limited to 4,096 tokens per request

Requirements

Define a prompt optimization approach for multiple enterprise tasks
Build a preprocessing pipeline for noisy internal text and task metadata
Implement a modern Python workflow for prompt generation, scoring, and evaluation
Explain how you would compare prompt variants using offline and online metrics
Describe failure modes such as hallucination, prompt injection, and formatting drift
Recommend how to operationalize prompt versioning, rollback, and monitoring

Business Context

Data

Success Criteria

Constraints

Prompts must not expose confidential data beyond the minimum required context
The solution must run in a private cloud environment
Prompt templates must be versioned, testable, and easy for non-ML teams to update
Token budget is limited to 4,096 tokens per request

Requirements

Define a prompt optimization approach for multiple enterprise tasks
Build a preprocessing pipeline for noisy internal text and task metadata
Implement a modern Python workflow for prompt generation, scoring, and evaluation
Explain how you would compare prompt variants using offline and online metrics
Describe failure modes such as hallucination, prompt injection, and formatting drift
Recommend how to operationalize prompt versioning, rollback, and monitoring

Business Context

Data

Success Criteria

Constraints

Prompts must not expose confidential data beyond the minimum required context
The solution must run in a private cloud environment
Prompt templates must be versioned, testable, and easy for non-ML teams to update
Token budget is limited to 4,096 tokens per request

Requirements

Define a prompt optimization approach for multiple enterprise tasks
Build a preprocessing pipeline for noisy internal text and task metadata
Implement a modern Python workflow for prompt generation, scoring, and evaluation
Explain how you would compare prompt variants using offline and online metrics
Describe failure modes such as hallucination, prompt injection, and formatting drift
Recommend how to operationalize prompt versioning, rollback, and monitoring

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Optimize Enterprise LLM Prompts

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Optimize Enterprise LLM Prompts

Business Context

Data

Success Criteria

Constraints

Requirements

Optimize Enterprise LLM Prompts

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer