Business Context
ShopSprint, an online marketplace for consumer brands, wants to automate short-form ad copy generation for paid social and search campaigns. The marketing team needs an NLP system that can generate on-brand headlines and descriptions from product metadata while avoiding repetitive, misleading, or policy-violating text.
Data
- Volume: 850,000 historical ads paired with product titles, bullet points, category, price, brand, and campaign objective
- Text length: product inputs are 20-250 words; output ad copy is 5-40 words per field
- Language: English only for the first release
- Label distribution: 55% conversion-focused ads, 30% awareness-focused ads, 15% promotion-focused ads
- Quality issues: duplicated creatives, inconsistent capitalization, HTML fragments, emoji, and missing product attributes in ~12% of rows
Success Criteria
A good solution should generate ad copy that is grammatically correct, relevant to the product, aligned with campaign intent, and compliant with brand constraints. Target at least 0.75 ROUGE-L against approved copy on holdout data, less than 2% policy-violation rate in offline review, and average generation latency under 300 ms per request for batch campaign creation.
Constraints
- Must run in a secure VPC; no external API calls at inference time
- Generated copy must not invent unsupported claims or discounts
- Maximum model size: deployable on a single A10 GPU
- Marketing reviewers need controllable outputs by tone, channel, and objective
Requirements
- Build an NLP pipeline to generate ad headlines and descriptions from structured product text.
- Define preprocessing for noisy catalog data and historical ad creatives.
- Implement a modern Python solution using a transformer-based sequence-to-sequence model.
- Include controllable generation inputs such as brand tone, channel, and campaign objective.
- Propose offline evaluation, human review, and failure analysis for hallucinations and repetition.
- Explain how you would enforce copy constraints and prepare the system for production deployment.