Production Prompting: ReAct vs CoT

Business Context

You’re building FinAssist, a customer-support copilot for a top-10 US fintech that serves 18M monthly active users and processes ~220K support tickets/day across chat and email. The copilot helps agents resolve issues like chargebacks, card declines, account locks, and subscription cancellations. The product roadmap includes tool access (e.g., transaction lookup, KYC status, policy retrieval) and a requirement to reduce average handle time (AHT) by 20% without increasing compliance incidents.

A previous prototype used Chain-of-Thought (CoT) prompting to improve reasoning quality, but it occasionally produced sensitive internal reasoning in logs and sometimes hallucinated “actions” (e.g., claiming it checked a transaction). A newer prototype uses ReAct prompting (reasoning + actions) and actually calls tools, but it increases latency and introduces new failure modes (tool misuse, loops, and inconsistent citation of retrieved policy).

Your task is to explain the difference between ReAct and CoT prompting and propose when/how to use each in production for FinAssist.

Data Characteristics

Inputs: Customer messages (English 94%, Spanish 5%, other 1%), plus structured metadata (account tier, region, product type).
Text length: 5–900 tokens (median ~120). Often includes partial IDs, timestamps, merchant names, and screenshots transcribed by OCR.
Domain vocabulary: chargeback codes, KYC/AML terms, card network rules, internal policy names.
Ground truth: Historical agent resolutions, internal policy articles, and tool outputs (transaction status, dispute eligibility, risk flags).

Success Criteria

Resolution quality: ≥ 4.5/5 average agent rating on helpfulness.
Compliance: 0 P0 incidents (no disclosure of internal policy text marked confidential; no revealing hidden reasoning; no unsafe advice).
Latency: p95 end-to-end < 1.2s for “no-tool” responses; < 2.5s when tools are used.
Reliability: Tool calls must be truthful, logged, and reproducible for audit.

Constraints

Regulatory: SOC2 + PCI considerations; prompts and outputs are retained for audit—avoid storing sensitive chain-of-thought.
Observability: You must support debugging and post-incident review without exposing hidden reasoning to agents or customers.
Tooling: Available tools include search_policy(query), get_transaction(txn_id), get_dispute_status(case_id), and create_case(summary).

Requirements (Deliverables)

Conceptual: Clearly differentiate ReAct vs CoT prompting (what is generated, what is executed, and what is exposed).
Decision framework: Provide a production rubric for choosing ReAct vs CoT per ticket type (e.g., “password reset” vs “chargeback eligibility”).
Prompt design: Draft two prompt templates:
- A CoT-style template that improves reasoning but does not reveal chain-of-thought to end users.
- A ReAct-style template that uses tools safely with explicit action schemas and stop conditions.
Safety & compliance: Propose mitigations for leakage (hidden reasoning), hallucinated tool usage, and policy mis-citation.
Evaluation plan: Define offline and online metrics to compare approaches (quality, latency, tool accuracy, and compliance).
Failure analysis: List at least 5 realistic failure modes and how you’d detect/triage them in logs.

Use the example text below as a representative ticket and explain which approach you’d choose and why.

Business Context

Your task is to explain the difference between ReAct and CoT prompting and propose when/how to use each in production for FinAssist.

Data Characteristics

Inputs: Customer messages (English 94%, Spanish 5%, other 1%), plus structured metadata (account tier, region, product type).
Text length: 5–900 tokens (median ~120). Often includes partial IDs, timestamps, merchant names, and screenshots transcribed by OCR.
Domain vocabulary: chargeback codes, KYC/AML terms, card network rules, internal policy names.
Ground truth: Historical agent resolutions, internal policy articles, and tool outputs (transaction status, dispute eligibility, risk flags).

Success Criteria

Resolution quality: ≥ 4.5/5 average agent rating on helpfulness.
Compliance: 0 P0 incidents (no disclosure of internal policy text marked confidential; no revealing hidden reasoning; no unsafe advice).
Latency: p95 end-to-end < 1.2s for “no-tool” responses; < 2.5s when tools are used.
Reliability: Tool calls must be truthful, logged, and reproducible for audit.

Constraints

Regulatory: SOC2 + PCI considerations; prompts and outputs are retained for audit—avoid storing sensitive chain-of-thought.
Observability: You must support debugging and post-incident review without exposing hidden reasoning to agents or customers.
Tooling: Available tools include search_policy(query), get_transaction(txn_id), get_dispute_status(case_id), and create_case(summary).

Requirements (Deliverables)

Conceptual: Clearly differentiate ReAct vs CoT prompting (what is generated, what is executed, and what is exposed).
Decision framework: Provide a production rubric for choosing ReAct vs CoT per ticket type (e.g., “password reset” vs “chargeback eligibility”).
Prompt design: Draft two prompt templates:
- A CoT-style template that improves reasoning but does not reveal chain-of-thought to end users.
- A ReAct-style template that uses tools safely with explicit action schemas and stop conditions.
Safety & compliance: Propose mitigations for leakage (hidden reasoning), hallucinated tool usage, and policy mis-citation.
Evaluation plan: Define offline and online metrics to compare approaches (quality, latency, tool accuracy, and compliance).
Failure analysis: List at least 5 realistic failure modes and how you’d detect/triage them in logs.

Use the example text below as a representative ticket and explain which approach you’d choose and why.

Business Context

Your task is to explain the difference between ReAct and CoT prompting and propose when/how to use each in production for FinAssist.

Data Characteristics

Inputs: Customer messages (English 94%, Spanish 5%, other 1%), plus structured metadata (account tier, region, product type).
Text length: 5–900 tokens (median ~120). Often includes partial IDs, timestamps, merchant names, and screenshots transcribed by OCR.
Domain vocabulary: chargeback codes, KYC/AML terms, card network rules, internal policy names.
Ground truth: Historical agent resolutions, internal policy articles, and tool outputs (transaction status, dispute eligibility, risk flags).

Success Criteria

Resolution quality: ≥ 4.5/5 average agent rating on helpfulness.
Compliance: 0 P0 incidents (no disclosure of internal policy text marked confidential; no revealing hidden reasoning; no unsafe advice).
Latency: p95 end-to-end < 1.2s for “no-tool” responses; < 2.5s when tools are used.
Reliability: Tool calls must be truthful, logged, and reproducible for audit.

Constraints

Regulatory: SOC2 + PCI considerations; prompts and outputs are retained for audit—avoid storing sensitive chain-of-thought.
Observability: You must support debugging and post-incident review without exposing hidden reasoning to agents or customers.
Tooling: Available tools include search_policy(query), get_transaction(txn_id), get_dispute_status(case_id), and create_case(summary).

Requirements (Deliverables)

Conceptual: Clearly differentiate ReAct vs CoT prompting (what is generated, what is executed, and what is exposed).
Decision framework: Provide a production rubric for choosing ReAct vs CoT per ticket type (e.g., “password reset” vs “chargeback eligibility”).
Prompt design: Draft two prompt templates:
- A CoT-style template that improves reasoning but does not reveal chain-of-thought to end users.
- A ReAct-style template that uses tools safely with explicit action schemas and stop conditions.
Safety & compliance: Propose mitigations for leakage (hidden reasoning), hallucinated tool usage, and policy mis-citation.
Evaluation plan: Define offline and online metrics to compare approaches (quality, latency, tool accuracy, and compliance).
Failure analysis: List at least 5 realistic failure modes and how you’d detect/triage them in logs.

Use the example text below as a representative ticket and explain which approach you’d choose and why.

Business Context

Your task is to explain the difference between ReAct and CoT prompting and propose when/how to use each in production for FinAssist.

Data Characteristics

Inputs: Customer messages (English 94%, Spanish 5%, other 1%), plus structured metadata (account tier, region, product type).
Text length: 5–900 tokens (median ~120). Often includes partial IDs, timestamps, merchant names, and screenshots transcribed by OCR.
Domain vocabulary: chargeback codes, KYC/AML terms, card network rules, internal policy names.
Ground truth: Historical agent resolutions, internal policy articles, and tool outputs (transaction status, dispute eligibility, risk flags).

Success Criteria

Resolution quality: ≥ 4.5/5 average agent rating on helpfulness.
Compliance: 0 P0 incidents (no disclosure of internal policy text marked confidential; no revealing hidden reasoning; no unsafe advice).
Latency: p95 end-to-end < 1.2s for “no-tool” responses; < 2.5s when tools are used.
Reliability: Tool calls must be truthful, logged, and reproducible for audit.

Constraints

Regulatory: SOC2 + PCI considerations; prompts and outputs are retained for audit—avoid storing sensitive chain-of-thought.
Observability: You must support debugging and post-incident review without exposing hidden reasoning to agents or customers.
Tooling: Available tools include search_policy(query), get_transaction(txn_id), get_dispute_status(case_id), and create_case(summary).

Requirements (Deliverables)

Conceptual: Clearly differentiate ReAct vs CoT prompting (what is generated, what is executed, and what is exposed).
Decision framework: Provide a production rubric for choosing ReAct vs CoT per ticket type (e.g., “password reset” vs “chargeback eligibility”).
Prompt design: Draft two prompt templates:
- A CoT-style template that improves reasoning but does not reveal chain-of-thought to end users.
- A ReAct-style template that uses tools safely with explicit action schemas and stop conditions.
Safety & compliance: Propose mitigations for leakage (hidden reasoning), hallucinated tool usage, and policy mis-citation.
Evaluation plan: Define offline and online metrics to compare approaches (quality, latency, tool accuracy, and compliance).
Failure analysis: List at least 5 realistic failure modes and how you’d detect/triage them in logs.

Use the example text below as a representative ticket and explain which approach you’d choose and why.

Interview Guides

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Production Prompting: ReAct vs CoT

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer

Production Prompting: ReAct vs CoT

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Production Prompting: ReAct vs CoT

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer