Explain Legal AI Workflow Safely

Context

Hamilton & Reed, a mid-sized law firm, wants an internal assistant that explains a contract-review AI workflow to skeptical practice leaders. The feature should turn a technical workflow description into a concise executive briefing that is accurate, non-hyped, and explicit about human oversight.

Constraints

p95 latency: 1,500ms for a single explanation request
Cost ceiling: $8 per 1,000 requests
Hallucination ceiling: <2% on a 150-prompt golden set
Must not imply the system gives legal advice or makes autonomous filing decisions
Must resist prompt injection if the workflow notes contain adversarial text such as "ignore prior instructions"
Output must be understandable to a non-technical law firm executive in under 250 words

Available Data / Models

2,000 internal workflow documents describing intake, OCR, clause extraction, retrieval, human review, audit logging, and escalation paths
150 labeled examples of strong vs weak executive explanations
Approved model access: GPT-4.1-mini or Claude Sonnet class models
Optional retrieval layer over workflow docs and policy memos
Internal policy text defining prohibited claims, required disclaimers, and approved terminology

Deliverables

Design the prompt-based solution that converts a technical AI workflow into an executive-friendly explanation while preserving accuracy and skepticism-aware framing.
Define an evaluation plan first: offline metrics and online metrics for trust, clarity, and hallucination risk.
Propose the serving architecture, including whether you would use direct prompting or lightweight RAG over workflow and policy documents.
Show how you would enforce structured output, refusal behavior, and guardrails against overclaiming, prompt injection, and legal-risky wording.
Estimate cost and latency, and explain the tradeoffs between a cheaper/faster model and a more reliable one.

Constraints

p95 latency: 1,500ms for a single explanation request

Cost ceiling: $8 per 1,000 requests

Hallucination ceiling: <2% on a 150-prompt golden set

Must not imply the system gives legal advice or makes autonomous filing decisions

Must resist prompt injection if the workflow notes contain adversarial text such as "ignore prior instructions"

Output must be understandable to a non-technical law firm executive in under 250 words

Available Data / Models

2,000 internal workflow documents describing intake, OCR, clause extraction, retrieval, human review, audit logging, and escalation paths

150 labeled examples of strong vs weak executive explanations

Approved model access: GPT-4.1-mini or Claude Sonnet class models

Optional retrieval layer over workflow docs and policy memos

Internal policy text defining prohibited claims, required disclaimers, and approved terminology

Deliverables

Design the prompt-based solution that converts a technical AI workflow into an executive-friendly explanation while preserving accuracy and skepticism-aware framing.

Define an evaluation plan first: offline metrics and online metrics for trust, clarity, and hallucination risk.

Propose the serving architecture, including whether you would use direct prompting or lightweight RAG over workflow and policy documents.

Show how you would enforce structured output, refusal behavior, and guardrails against overclaiming, prompt injection, and legal-risky wording.

Estimate cost and latency, and explain the tradeoffs between a cheaper/faster model and a more reliable one.

Constraints

p95 latency: 1,500ms for a single explanation request

Cost ceiling: $8 per 1,000 requests

Hallucination ceiling: <2% on a 150-prompt golden set

Must not imply the system gives legal advice or makes autonomous filing decisions

Must resist prompt injection if the workflow notes contain adversarial text such as "ignore prior instructions"

Output must be understandable to a non-technical law firm executive in under 250 words

Available Data / Models

2,000 internal workflow documents describing intake, OCR, clause extraction, retrieval, human review, audit logging, and escalation paths

150 labeled examples of strong vs weak executive explanations

Approved model access: GPT-4.1-mini or Claude Sonnet class models

Optional retrieval layer over workflow docs and policy memos

Internal policy text defining prohibited claims, required disclaimers, and approved terminology

Deliverables

Design the prompt-based solution that converts a technical AI workflow into an executive-friendly explanation while preserving accuracy and skepticism-aware framing.

Define an evaluation plan first: offline metrics and online metrics for trust, clarity, and hallucination risk.

Propose the serving architecture, including whether you would use direct prompting or lightweight RAG over workflow and policy documents.

Show how you would enforce structured output, refusal behavior, and guardrails against overclaiming, prompt injection, and legal-risky wording.

Estimate cost and latency, and explain the tradeoffs between a cheaper/faster model and a more reliable one.

Constraints

p95 latency: 1,500ms for a single explanation request

Cost ceiling: $8 per 1,000 requests

Hallucination ceiling: <2% on a 150-prompt golden set

Must not imply the system gives legal advice or makes autonomous filing decisions

Must resist prompt injection if the workflow notes contain adversarial text such as "ignore prior instructions"

Output must be understandable to a non-technical law firm executive in under 250 words

Available Data / Models

2,000 internal workflow documents describing intake, OCR, clause extraction, retrieval, human review, audit logging, and escalation paths

150 labeled examples of strong vs weak executive explanations

Approved model access: GPT-4.1-mini or Claude Sonnet class models

Optional retrieval layer over workflow docs and policy memos

Internal policy text defining prohibited claims, required disclaimers, and approved terminology

Deliverables

Design the prompt-based solution that converts a technical AI workflow into an executive-friendly explanation while preserving accuracy and skepticism-aware framing.

Define an evaluation plan first: offline metrics and online metrics for trust, clarity, and hallucination risk.

Propose the serving architecture, including whether you would use direct prompting or lightweight RAG over workflow and policy documents.

Show how you would enforce structured output, refusal behavior, and guardrails against overclaiming, prompt injection, and legal-risky wording.

Estimate cost and latency, and explain the tradeoffs between a cheaper/faster model and a more reliable one.

Interview Guides

Context

Constraints

Available Data / Models

Deliverables

Explain Legal AI Workflow Safely

Context

Constraints

Available Data / Models

Deliverables

Your Answer

Explain Legal AI Workflow Safely

Context

Constraints

Available Data / Models

Deliverables

Explain Legal AI Workflow Safely

Context

Constraints

Available Data / Models

Deliverables

Your Answer