Guardrails for LLM SQL Tooling

Business Context

ShopSwift is a global e-commerce marketplace (12M monthly active buyers, ~180K support tickets/day). The company is rolling out an LLM-powered customer support agent that can (a) retrieve policy and troubleshooting passages from a vector store (RAG over 8M internal docs) and (b) query a PostgreSQL warehouse to answer account/order questions (e.g., refunds, shipment status, subscription billing). A prior prototype accidentally generated UPDATE statements during a conversation and, in another case, attempted to query a table containing restricted PII. Because support agents operate under strict privacy and SOX-style audit controls, the system must prevent dangerous SQL while still being helpful.

The agent is designed as a tool-using LLM: it decides whether to call search_docs(query) (vector store) and/or run_sql(sql) (database). Your task is to design an NLP-centric safety layer that constrains the LLM’s SQL behavior and detects prompt-injection attempts coming from retrieved documents or user messages.

Data Characteristics

User messages: 1–10 turns, 20–250 tokens per turn (median ~60). English (95%), Spanish (3%), other (2%). Includes order IDs, emails, addresses, and informal language.
Database: ~60 tables; only ~12 are approved for support reads (orders, shipments, returns, payments). Several tables contain restricted PII (full card PAN, SSN fragments, internal risk flags).
Vector store: internal wiki pages, runbooks, and macros. Some pages include example SQL snippets; a small fraction contains outdated or overly broad queries (e.g., SELECT * FROM users). Assume the vector store can be poisoned by accidental or malicious content.

Success Criteria

Safety: 0 successful executions of non-read-only SQL (no INSERT/UPDATE/DELETE/DDL, no multi-statement, no COPY, no pg_sleep, no UNION exfiltration patterns against restricted tables).
Utility: ≥90% of “order status / refund eligibility / shipment ETA” tickets resolved without human escalation.
Latency: p95 end-to-end response < 1.5s; SQL tool call budget ≤ 1 call per user turn.
Auditability: every SQL query must be logged with an allow/deny decision and reason.

Constraints

Must run in a VPC; no external network calls during inference.
Only parameterized queries are allowed at execution time.
Access control must be enforceable even if the LLM is compromised (assume worst-case prompt injection).

Requirements (Deliverables)

Propose a defense-in-depth architecture for safe SQL tool use (prompting + programmatic controls).
Design an NLP component that classifies user intent into a small set of approved query templates (e.g., order_status, refund_policy, shipment_tracking) and extracts required entities (order_id, email).
Implement a SQL gatekeeper that validates generated SQL using an AST parser and enforces a strict allowlist (read-only, single statement, approved tables/columns, LIMIT required).
Explain how you will mitigate prompt injection from both user messages and retrieved documents (vector store).
Provide an evaluation plan: safety tests (red-team prompts), utility metrics, and monitoring for drift.

Business Context

Data Characteristics

User messages: 1–10 turns, 20–250 tokens per turn (median ~60). English (95%), Spanish (3%), other (2%). Includes order IDs, emails, addresses, and informal language.
Database: ~60 tables; only ~12 are approved for support reads (orders, shipments, returns, payments). Several tables contain restricted PII (full card PAN, SSN fragments, internal risk flags).
Vector store: internal wiki pages, runbooks, and macros. Some pages include example SQL snippets; a small fraction contains outdated or overly broad queries (e.g., SELECT * FROM users). Assume the vector store can be poisoned by accidental or malicious content.

Success Criteria

Safety: 0 successful executions of non-read-only SQL (no INSERT/UPDATE/DELETE/DDL, no multi-statement, no COPY, no pg_sleep, no UNION exfiltration patterns against restricted tables).
Utility: ≥90% of “order status / refund eligibility / shipment ETA” tickets resolved without human escalation.
Latency: p95 end-to-end response < 1.5s; SQL tool call budget ≤ 1 call per user turn.
Auditability: every SQL query must be logged with an allow/deny decision and reason.

Constraints

Must run in a VPC; no external network calls during inference.
Only parameterized queries are allowed at execution time.
Access control must be enforceable even if the LLM is compromised (assume worst-case prompt injection).

Requirements (Deliverables)

Propose a defense-in-depth architecture for safe SQL tool use (prompting + programmatic controls).
Design an NLP component that classifies user intent into a small set of approved query templates (e.g., order_status, refund_policy, shipment_tracking) and extracts required entities (order_id, email).
Implement a SQL gatekeeper that validates generated SQL using an AST parser and enforces a strict allowlist (read-only, single statement, approved tables/columns, LIMIT required).
Explain how you will mitigate prompt injection from both user messages and retrieved documents (vector store).
Provide an evaluation plan: safety tests (red-team prompts), utility metrics, and monitoring for drift.

Business Context

Data Characteristics

User messages: 1–10 turns, 20–250 tokens per turn (median ~60). English (95%), Spanish (3%), other (2%). Includes order IDs, emails, addresses, and informal language.
Database: ~60 tables; only ~12 are approved for support reads (orders, shipments, returns, payments). Several tables contain restricted PII (full card PAN, SSN fragments, internal risk flags).
Vector store: internal wiki pages, runbooks, and macros. Some pages include example SQL snippets; a small fraction contains outdated or overly broad queries (e.g., SELECT * FROM users). Assume the vector store can be poisoned by accidental or malicious content.

Success Criteria

Safety: 0 successful executions of non-read-only SQL (no INSERT/UPDATE/DELETE/DDL, no multi-statement, no COPY, no pg_sleep, no UNION exfiltration patterns against restricted tables).
Utility: ≥90% of “order status / refund eligibility / shipment ETA” tickets resolved without human escalation.
Latency: p95 end-to-end response < 1.5s; SQL tool call budget ≤ 1 call per user turn.
Auditability: every SQL query must be logged with an allow/deny decision and reason.

Constraints

Must run in a VPC; no external network calls during inference.
Only parameterized queries are allowed at execution time.
Access control must be enforceable even if the LLM is compromised (assume worst-case prompt injection).

Requirements (Deliverables)

Propose a defense-in-depth architecture for safe SQL tool use (prompting + programmatic controls).
Design an NLP component that classifies user intent into a small set of approved query templates (e.g., order_status, refund_policy, shipment_tracking) and extracts required entities (order_id, email).
Implement a SQL gatekeeper that validates generated SQL using an AST parser and enforces a strict allowlist (read-only, single statement, approved tables/columns, LIMIT required).
Explain how you will mitigate prompt injection from both user messages and retrieved documents (vector store).
Provide an evaluation plan: safety tests (red-team prompts), utility metrics, and monitoring for drift.

Business Context

Data Characteristics

User messages: 1–10 turns, 20–250 tokens per turn (median ~60). English (95%), Spanish (3%), other (2%). Includes order IDs, emails, addresses, and informal language.
Database: ~60 tables; only ~12 are approved for support reads (orders, shipments, returns, payments). Several tables contain restricted PII (full card PAN, SSN fragments, internal risk flags).
Vector store: internal wiki pages, runbooks, and macros. Some pages include example SQL snippets; a small fraction contains outdated or overly broad queries (e.g., SELECT * FROM users). Assume the vector store can be poisoned by accidental or malicious content.

Success Criteria

Safety: 0 successful executions of non-read-only SQL (no INSERT/UPDATE/DELETE/DDL, no multi-statement, no COPY, no pg_sleep, no UNION exfiltration patterns against restricted tables).
Utility: ≥90% of “order status / refund eligibility / shipment ETA” tickets resolved without human escalation.
Latency: p95 end-to-end response < 1.5s; SQL tool call budget ≤ 1 call per user turn.
Auditability: every SQL query must be logged with an allow/deny decision and reason.

Constraints

Must run in a VPC; no external network calls during inference.
Only parameterized queries are allowed at execution time.
Access control must be enforceable even if the LLM is compromised (assume worst-case prompt injection).

Requirements (Deliverables)

Propose a defense-in-depth architecture for safe SQL tool use (prompting + programmatic controls).
Design an NLP component that classifies user intent into a small set of approved query templates (e.g., order_status, refund_policy, shipment_tracking) and extracts required entities (order_id, email).
Implement a SQL gatekeeper that validates generated SQL using an AST parser and enforces a strict allowlist (read-only, single statement, approved tables/columns, LIMIT required).
Explain how you will mitigate prompt injection from both user messages and retrieved documents (vector store).
Provide an evaluation plan: safety tests (red-team prompts), utility metrics, and monitoring for drift.

Interview Guides

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Guardrails for LLM SQL Tooling

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer

Guardrails for LLM SQL Tooling

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Guardrails for LLM SQL Tooling

Business Context

Data Characteristics

Success Criteria

Constraints

Requirements (Deliverables)

Your Answer