Business Context
Resideo wants to reduce manual triage for support cases coming from Honeywell Home thermostats, security devices, and water leak sensors. You need to show when supervised learning is appropriate for predicting known issue categories and when unsupervised learning is better for discovering new patterns in unlabeled support traffic.
Dataset
You are given historical support-case data exported from Resideo customer support systems.
| Feature Group | Count | Examples |
|---|
| Structured case metadata | 10 | product_line, device_model, firmware_version, app_platform, region |
| Device telemetry aggregates | 12 | reconnect_count_24h, battery_level, signal_strength, temp_delta, sensor_fault_count |
| Customer/account context | 6 | install_age_days, homeowner_vs_pro, warranty_status, prior_case_count |
| Text-derived features | 20 | TF-IDF or embedding features from case subject and notes |
| Labels | 1 | known_issue_type for a subset of cases |
- Size: 82K support cases over 18 months, 48 usable features after preprocessing
- Target:
known_issue_type with 6 classes for labeled cases only
- Label coverage: 61K labeled cases, 21K unlabeled cases
- Missing data: ~9% missing telemetry fields, ~14% missing firmware_version, sparse text fields in ~7% of rows
Success Criteria
A strong solution should:
- Train a supervised model on labeled data that achieves macro F1 >= 0.72 on a held-out test set
- Produce unsupervised clusters on all cases with silhouette score >= 0.20 and a clear interpretation of cluster themes
- Explain, in practical terms, the difference between supervised and unsupervised learning and when each should be used at Resideo
Constraints
- Predictions will be used in a support workflow, so results must be explainable enough for operations teams
- Batch scoring must finish in under 10 minutes for ~10K daily new cases
- The approach should tolerate mixed structured, categorical, and text-derived features
Deliverables
- Build a supervised model to predict
known_issue_type for labeled cases
- Build an unsupervised model to group all support cases and identify emerging issue patterns
- Compare the two approaches, including required labels, objective, outputs, and evaluation methods
- Describe how you would deploy both outputs in a Resideo support triage pipeline
- Provide code, metrics, and a short explanation of tradeoffs