You are building a protection layer for an enterprise assistant used by employees to summarize documents, draft emails, and query internal knowledge bases. The system processes prompt and response text from chat sessions, uploaded files, and retrieval snippets, and your team wants an NLP pipeline that can detect sensitive data exposure, prompt-injection attempts, and policy-violating requests before content reaches downstream language models. You have several million historical prompt-response pairs, weak labels from security rules, a small hand-labeled review set, and multilingual traffic with long pasted documents, code blocks, and business jargon. The solution must support near-real-time screening while preserving enough context for legitimate use cases such as document summarization and internal search.
How would you design this NLP system so it can classify risky interactions, extract sensitive entities from text, and handle long, noisy enterprise inputs in production? Explain the modeling approach, preprocessing pipeline, implementation choices, and how you would evaluate and harden it against evolving attack patterns.