Explain GenAI vs ML Safely

Context

FinEdge is building a sales-assist copilot for account executives. One feature drafts short customer-facing explanations of technical concepts, including the difference between generative AI and traditional machine learning, tailored to non-technical buyers.

Constraints

p95 latency: 1,200ms per response
Cost ceiling: $6K/month at 100K requests/month
Hallucination ceiling: <2% on a 200-prompt golden set
Tone must be business-friendly, accurate, and avoid overclaiming capabilities
Must refuse or hedge when asked for unsupported ROI, legal, or compliance claims
Output must be structured so downstream UI can render: audience, answer, bullets, risks, cta

Available Resources

1,500 historical sales-engineering responses labeled as strong / weak
Product-approved messaging guide with definitions, approved claims, and banned phrases
A small taxonomy of customer personas: CIO, Head of Data, Operations Lead, SMB Owner
Access to a GPT-4-class or Claude-class model via API
200 evaluation prompts covering simple asks, adversarial asks, and requests containing false assumptions

Task

Design a prompt-based solution that explains the difference between generative AI and traditional machine learning to a customer, while adapting tone and depth by persona.
Define an evaluation plan before architecture: how you will measure factual accuracy, clarity, refusal quality, hallucination rate, and consistency with approved messaging.
Propose the runtime architecture, including prompt construction, structured output validation, fallback behavior, and monitoring.
Estimate cost and latency at target volume, and describe optimizations if the first design misses either budget.
Identify likely failure modes such as hallucinated business claims, prompt injection through user input, and invalid structured output, with mitigations.

Constraints

p95 latency: 1,200ms per response

Cost ceiling: $6K/month at 100K requests/month

Hallucination ceiling: <2% on a 200-prompt golden set

Tone must be business-friendly, accurate, and avoid overclaiming capabilities

Must refuse or hedge when asked for unsupported ROI, legal, or compliance claims

Output must be structured so downstream UI can render: audience, answer, bullets, risks, cta

Available Resources

1,500 historical sales-engineering responses labeled as strong / weak

Product-approved messaging guide with definitions, approved claims, and banned phrases

A small taxonomy of customer personas: CIO, Head of Data, Operations Lead, SMB Owner

Access to a GPT-4-class or Claude-class model via API

200 evaluation prompts covering simple asks, adversarial asks, and requests containing false assumptions

Task

Design a prompt-based solution that explains the difference between generative AI and traditional machine learning to a customer, while adapting tone and depth by persona.

Define an evaluation plan before architecture: how you will measure factual accuracy, clarity, refusal quality, hallucination rate, and consistency with approved messaging.

Propose the runtime architecture, including prompt construction, structured output validation, fallback behavior, and monitoring.

Estimate cost and latency at target volume, and describe optimizations if the first design misses either budget.

Identify likely failure modes such as hallucinated business claims, prompt injection through user input, and invalid structured output, with mitigations.

Constraints

p95 latency: 1,200ms per response

Cost ceiling: $6K/month at 100K requests/month

Hallucination ceiling: <2% on a 200-prompt golden set

Tone must be business-friendly, accurate, and avoid overclaiming capabilities

Must refuse or hedge when asked for unsupported ROI, legal, or compliance claims

Output must be structured so downstream UI can render: audience, answer, bullets, risks, cta

Available Resources

1,500 historical sales-engineering responses labeled as strong / weak

Product-approved messaging guide with definitions, approved claims, and banned phrases

A small taxonomy of customer personas: CIO, Head of Data, Operations Lead, SMB Owner

Access to a GPT-4-class or Claude-class model via API

200 evaluation prompts covering simple asks, adversarial asks, and requests containing false assumptions

Task

Design a prompt-based solution that explains the difference between generative AI and traditional machine learning to a customer, while adapting tone and depth by persona.

Define an evaluation plan before architecture: how you will measure factual accuracy, clarity, refusal quality, hallucination rate, and consistency with approved messaging.

Propose the runtime architecture, including prompt construction, structured output validation, fallback behavior, and monitoring.

Estimate cost and latency at target volume, and describe optimizations if the first design misses either budget.

Identify likely failure modes such as hallucinated business claims, prompt injection through user input, and invalid structured output, with mitigations.

Constraints

p95 latency: 1,200ms per response

Cost ceiling: $6K/month at 100K requests/month

Hallucination ceiling: <2% on a 200-prompt golden set

Tone must be business-friendly, accurate, and avoid overclaiming capabilities

Must refuse or hedge when asked for unsupported ROI, legal, or compliance claims

Output must be structured so downstream UI can render: audience, answer, bullets, risks, cta

Available Resources

1,500 historical sales-engineering responses labeled as strong / weak

Product-approved messaging guide with definitions, approved claims, and banned phrases

A small taxonomy of customer personas: CIO, Head of Data, Operations Lead, SMB Owner

Access to a GPT-4-class or Claude-class model via API

200 evaluation prompts covering simple asks, adversarial asks, and requests containing false assumptions

Task

Design a prompt-based solution that explains the difference between generative AI and traditional machine learning to a customer, while adapting tone and depth by persona.

Define an evaluation plan before architecture: how you will measure factual accuracy, clarity, refusal quality, hallucination rate, and consistency with approved messaging.

Propose the runtime architecture, including prompt construction, structured output validation, fallback behavior, and monitoring.

Estimate cost and latency at target volume, and describe optimizations if the first design misses either budget.

Identify likely failure modes such as hallucinated business claims, prompt injection through user input, and invalid structured output, with mitigations.

Interview Guides

Context

Constraints

Available Resources

Task

Explain GenAI vs ML Safely

Context

Constraints

Available Resources

Task

Your Answer

Explain GenAI vs ML Safely

Context

Constraints

Available Resources

Task

Explain GenAI vs ML Safely

Context

Constraints

Available Resources

Task

Your Answer