Business Context
Squarespace wants to add NLP-powered assistance to its Customer Operations team across support chat, email, and Help Center workflows. You need to design model serving infrastructure that can power multiple language tasks in production, including ticket intent classification, response drafting, and retrieval-augmented answer generation for Squarespace product questions.
Data
- Volume: ~2M historical support conversations and Help Center articles; ~80K new support messages per day
- Text length: 5-2,000 tokens (median 180); multi-turn chat transcripts and long-form email threads
- Language: English-first, with smaller volumes of French, German, and Spanish
- Label distribution: Highly skewed across intents (billing, domains, scheduling, commerce, design editor, account access)
- Inputs: Raw user text, conversation history, article metadata, product surface (e.g. Squarespace Domains, Scheduling, Commerce)
Success Criteria
A strong solution should support p95 latency under 300ms for classification and retrieval, under 2.5s for drafted responses, maintain high availability, and allow safe rollout of new models without disrupting support operations.
Constraints
- PII may appear in messages and must be handled safely
- Some tasks require real-time inference; others can be async
- Infrastructure must support versioning, A/B testing, fallback behavior, and monitoring for drift and quality regressions
- Cost matters: GPU capacity should be reserved for tasks that need it
Requirements
- Architect an NLP model serving system for at least three tasks: intent classification, semantic retrieval, and response generation.
- Define request flow, model routing, batching/caching strategy, and online vs async inference.
- Explain preprocessing for support text, conversation context, and multilingual inputs.
- Describe model/version management, deployment strategy, and rollback plan.
- Specify monitoring, evaluation, and failure handling for latency, quality, and safety.
- Include modern Python implementation examples for preprocessing and serving orchestration.