Business Context
At NovaStack, developer platform teams publish internal API release notes that are too technical for product managers to review quickly. You need an NLP system that rewrites technical endpoint descriptions into plain-language summaries suitable for non-technical stakeholders.
Data
You have 180,000 paired examples of source text and approved rewrites collected from API docs, changelogs, and internal product briefs.
- Task: rewrite technical API descriptions into plain English
- Volume: 180K train pairs, 20K validation pairs, 20K test pairs
- Text length: source 40-350 words; target 20-120 words
- Language: English only
- Content mix: REST endpoints, auth changes, rate limits, payload fields, error handling
- Quality issues: inconsistent formatting, code snippets, JSON fragments, version numbers, acronyms
Success Criteria
A good solution preserves the business meaning of the endpoint while reducing jargon, removing unnecessary implementation detail, and producing fluent, concise output. Target ROUGE-L ≥ 0.42 and BERTScore F1 ≥ 0.88, with human review showing ≥ 90% factual accuracy on a 300-sample audit.
Constraints
- Inference latency should be < 300 ms per paragraph in batch-serving mode
- The model must run on a single A10/T4-class GPU for fine-tuning and inference
- Output must not invent unsupported product behavior or API guarantees
Requirements
- Build an NLP pipeline for technical-to-plain-language rewriting.
- Define preprocessing for mixed prose, inline code, and structured API artifacts.
- Fine-tune a modern transformer model in Python.
- Evaluate both readability and semantic preservation.
- Describe how you would reduce hallucinations and detect unsafe rewrites.
- Provide example outputs and an error analysis plan.