Deploy On-Device LLM for Android

Business Context

Nova Keyboard wants to ship an on-device LLM feature for smart reply and short-form rewriting inside its Android app. The NLP team must identify the main constraints of running an LLM locally on consumer phones and design mitigations that preserve usability, privacy, and battery life.

Data

You are given 180,000 anonymized mobile prompts and completion logs collected from beta users, plus device telemetry from 35 Android models.

Task framing: classify each prompt-device session into the dominant deployment bottleneck and recommend a mitigation strategy
Text length: 5-220 tokens per prompt, median 38
Language: English only
Labels: memory, latency, battery, thermal, storage, privacy/network
Distribution: moderately imbalanced; latency and memory together account for ~52%

Success Criteria

A good solution should achieve macro-F1 >= 0.82, recall >= 0.88 on memory and latency, and generate recommendations that are technically actionable for Android deployment.

Constraints

Inference must complete in < 800 ms for 32-token generation on mid-tier devices
Model package size must stay < 1.2 GB after compression
No raw user text may leave the device
The solution should account for heterogeneous CPU/GPU/NPU availability across Android devices

Requirements

Build an NLP pipeline that classifies deployment constraints from device-session text logs.
Propose preprocessing for mobile telemetry, prompts, and system diagnostics.
Fine-tune a modern transformer baseline in Python.
Explain mitigation strategies such as quantization, distillation, KV-cache limits, and token budget control.
Define an evaluation plan covering model quality and practical on-device behavior.

Business Context

Data

You are given 180,000 anonymized mobile prompts and completion logs collected from beta users, plus device telemetry from 35 Android models.

Task framing: classify each prompt-device session into the dominant deployment bottleneck and recommend a mitigation strategy
Text length: 5-220 tokens per prompt, median 38
Language: English only
Labels: memory, latency, battery, thermal, storage, privacy/network
Distribution: moderately imbalanced; latency and memory together account for ~52%

Success Criteria

A good solution should achieve macro-F1 >= 0.82, recall >= 0.88 on memory and latency, and generate recommendations that are technically actionable for Android deployment.

Constraints

Inference must complete in < 800 ms for 32-token generation on mid-tier devices
Model package size must stay < 1.2 GB after compression
No raw user text may leave the device
The solution should account for heterogeneous CPU/GPU/NPU availability across Android devices

Requirements

Build an NLP pipeline that classifies deployment constraints from device-session text logs.
Propose preprocessing for mobile telemetry, prompts, and system diagnostics.
Fine-tune a modern transformer baseline in Python.
Explain mitigation strategies such as quantization, distillation, KV-cache limits, and token budget control.
Define an evaluation plan covering model quality and practical on-device behavior.

Business Context

Data

You are given 180,000 anonymized mobile prompts and completion logs collected from beta users, plus device telemetry from 35 Android models.

Task framing: classify each prompt-device session into the dominant deployment bottleneck and recommend a mitigation strategy
Text length: 5-220 tokens per prompt, median 38
Language: English only
Labels: memory, latency, battery, thermal, storage, privacy/network
Distribution: moderately imbalanced; latency and memory together account for ~52%

Success Criteria

A good solution should achieve macro-F1 >= 0.82, recall >= 0.88 on memory and latency, and generate recommendations that are technically actionable for Android deployment.

Constraints

Inference must complete in < 800 ms for 32-token generation on mid-tier devices
Model package size must stay < 1.2 GB after compression
No raw user text may leave the device
The solution should account for heterogeneous CPU/GPU/NPU availability across Android devices

Requirements

Build an NLP pipeline that classifies deployment constraints from device-session text logs.
Propose preprocessing for mobile telemetry, prompts, and system diagnostics.
Fine-tune a modern transformer baseline in Python.
Explain mitigation strategies such as quantization, distillation, KV-cache limits, and token budget control.
Define an evaluation plan covering model quality and practical on-device behavior.

Business Context

Data

You are given 180,000 anonymized mobile prompts and completion logs collected from beta users, plus device telemetry from 35 Android models.

Task framing: classify each prompt-device session into the dominant deployment bottleneck and recommend a mitigation strategy
Text length: 5-220 tokens per prompt, median 38
Language: English only
Labels: memory, latency, battery, thermal, storage, privacy/network
Distribution: moderately imbalanced; latency and memory together account for ~52%

Success Criteria

A good solution should achieve macro-F1 >= 0.82, recall >= 0.88 on memory and latency, and generate recommendations that are technically actionable for Android deployment.

Constraints

Inference must complete in < 800 ms for 32-token generation on mid-tier devices
Model package size must stay < 1.2 GB after compression
No raw user text may leave the device
The solution should account for heterogeneous CPU/GPU/NPU availability across Android devices

Requirements

Build an NLP pipeline that classifies deployment constraints from device-session text logs.
Propose preprocessing for mobile telemetry, prompts, and system diagnostics.
Fine-tune a modern transformer baseline in Python.
Explain mitigation strategies such as quantization, distillation, KV-cache limits, and token budget control.
Define an evaluation plan covering model quality and practical on-device behavior.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Deploy On-Device LLM for Android

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Deploy On-Device LLM for Android

Business Context

Data

Success Criteria

Constraints

Requirements

Deploy On-Device LLM for Android

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer