Business Context
Nova Keyboard wants to ship an on-device LLM feature for smart reply and short-form rewriting inside its Android app. The NLP team must identify the main constraints of running an LLM locally on consumer phones and design mitigations that preserve usability, privacy, and battery life.
Data
You are given 180,000 anonymized mobile prompts and completion logs collected from beta users, plus device telemetry from 35 Android models.
- Task framing: classify each prompt-device session into the dominant deployment bottleneck and recommend a mitigation strategy
- Text length: 5-220 tokens per prompt, median 38
- Language: English only
- Labels:
memory, latency, battery, thermal, storage, privacy/network
- Distribution: moderately imbalanced;
latency and memory together account for ~52%
Success Criteria
A good solution should achieve macro-F1 >= 0.82, recall >= 0.88 on memory and latency, and generate recommendations that are technically actionable for Android deployment.
Constraints
- Inference must complete in < 800 ms for 32-token generation on mid-tier devices
- Model package size must stay < 1.2 GB after compression
- No raw user text may leave the device
- The solution should account for heterogeneous CPU/GPU/NPU availability across Android devices
Requirements
- Build an NLP pipeline that classifies deployment constraints from device-session text logs.
- Propose preprocessing for mobile telemetry, prompts, and system diagnostics.
- Fine-tune a modern transformer baseline in Python.
- Explain mitigation strategies such as quantization, distillation, KV-cache limits, and token budget control.
- Define an evaluation plan covering model quality and practical on-device behavior.