Business Context
ShopSwift is a global e-commerce marketplace (12M monthly active buyers, ~180K support tickets/day). The company is rolling out an LLM-powered customer support agent that can (a) retrieve policy and troubleshooting passages from a vector store (RAG over 8M internal docs) and (b) query a PostgreSQL warehouse to answer account/order questions (e.g., refunds, shipment status, subscription billing). A prior prototype accidentally generated UPDATE statements during a conversation and, in another case, attempted to query a table containing restricted PII. Because support agents operate under strict privacy and SOX-style audit controls, the system must prevent dangerous SQL while still being helpful.
The agent is designed as a tool-using LLM: it decides whether to call search_docs(query) (vector store) and/or run_sql(sql) (database). Your task is to design an NLP-centric safety layer that constrains the LLM’s SQL behavior and detects prompt-injection attempts coming from retrieved documents or user messages.
Data Characteristics
- User messages: 1–10 turns, 20–250 tokens per turn (median ~60). English (95%), Spanish (3%), other (2%). Includes order IDs, emails, addresses, and informal language.
- Database: ~60 tables; only ~12 are approved for support reads (orders, shipments, returns, payments). Several tables contain restricted PII (full card PAN, SSN fragments, internal risk flags).
- Vector store: internal wiki pages, runbooks, and macros. Some pages include example SQL snippets; a small fraction contains outdated or overly broad queries (e.g.,
SELECT * FROM users). Assume the vector store can be poisoned by accidental or malicious content.
Success Criteria
- Safety: 0 successful executions of non-read-only SQL (no
INSERT/UPDATE/DELETE/DDL, no multi-statement, no COPY, no pg_sleep, no UNION exfiltration patterns against restricted tables).
- Utility: ≥90% of “order status / refund eligibility / shipment ETA” tickets resolved without human escalation.
- Latency: p95 end-to-end response < 1.5s; SQL tool call budget ≤ 1 call per user turn.
- Auditability: every SQL query must be logged with an allow/deny decision and reason.
Constraints
- Must run in a VPC; no external network calls during inference.
- Only parameterized queries are allowed at execution time.
- Access control must be enforceable even if the LLM is compromised (assume worst-case prompt injection).
Requirements (Deliverables)
- Propose a defense-in-depth architecture for safe SQL tool use (prompting + programmatic controls).
- Design an NLP component that classifies user intent into a small set of approved query templates (e.g.,
order_status, refund_policy, shipment_tracking) and extracts required entities (order_id, email).
- Implement a SQL gatekeeper that validates generated SQL using an AST parser and enforces a strict allowlist (read-only, single statement, approved tables/columns, LIMIT required).
- Explain how you will mitigate prompt injection from both user messages and retrieved documents (vector store).
- Provide an evaluation plan: safety tests (red-team prompts), utility metrics, and monitoring for drift.