Classify Resideo Device Support Issues

Business Context

Resideo wants to reduce manual triage for support cases coming from Honeywell Home thermostats, security devices, and water leak sensors. You need to show when supervised learning is appropriate for predicting known issue categories and when unsupervised learning is better for discovering new patterns in unlabeled support traffic.

Dataset

You are given historical support-case data exported from Resideo customer support systems.

Feature Group	Count	Examples
Structured case metadata	10	product_line, device_model, firmware_version, app_platform, region
Device telemetry aggregates	12	reconnect_count_24h, battery_level, signal_strength, temp_delta, sensor_fault_count
Customer/account context	6	install_age_days, homeowner_vs_pro, warranty_status, prior_case_count
Text-derived features	20	TF-IDF or embedding features from case subject and notes
Labels	1	known_issue_type for a subset of cases

Size: 82K support cases over 18 months, 48 usable features after preprocessing
Target: known_issue_type with 6 classes for labeled cases only
Label coverage: 61K labeled cases, 21K unlabeled cases
Missing data: ~9% missing telemetry fields, ~14% missing firmware_version, sparse text fields in ~7% of rows

Success Criteria

A strong solution should:

Train a supervised model on labeled data that achieves macro F1 >= 0.72 on a held-out test set
Produce unsupervised clusters on all cases with silhouette score >= 0.20 and a clear interpretation of cluster themes
Explain, in practical terms, the difference between supervised and unsupervised learning and when each should be used at Resideo

Constraints

Predictions will be used in a support workflow, so results must be explainable enough for operations teams
Batch scoring must finish in under 10 minutes for ~10K daily new cases
The approach should tolerate mixed structured, categorical, and text-derived features

Deliverables

Build a supervised model to predict known_issue_type for labeled cases
Build an unsupervised model to group all support cases and identify emerging issue patterns
Compare the two approaches, including required labels, objective, outputs, and evaluation methods
Describe how you would deploy both outputs in a Resideo support triage pipeline
Provide code, metrics, and a short explanation of tradeoffs

Business Context

Dataset

You are given historical support-case data exported from Resideo customer support systems.

Feature Group	Count	Examples
Structured case metadata	10	product_line, device_model, firmware_version, app_platform, region
Device telemetry aggregates	12	reconnect_count_24h, battery_level, signal_strength, temp_delta, sensor_fault_count
Customer/account context	6	install_age_days, homeowner_vs_pro, warranty_status, prior_case_count
Text-derived features	20	TF-IDF or embedding features from case subject and notes
Labels	1	known_issue_type for a subset of cases

Size: 82K support cases over 18 months, 48 usable features after preprocessing
Target: known_issue_type with 6 classes for labeled cases only
Label coverage: 61K labeled cases, 21K unlabeled cases
Missing data: ~9% missing telemetry fields, ~14% missing firmware_version, sparse text fields in ~7% of rows

Success Criteria

A strong solution should:

Train a supervised model on labeled data that achieves macro F1 >= 0.72 on a held-out test set
Produce unsupervised clusters on all cases with silhouette score >= 0.20 and a clear interpretation of cluster themes
Explain, in practical terms, the difference between supervised and unsupervised learning and when each should be used at Resideo

Constraints

Predictions will be used in a support workflow, so results must be explainable enough for operations teams
Batch scoring must finish in under 10 minutes for ~10K daily new cases
The approach should tolerate mixed structured, categorical, and text-derived features

Deliverables

Build a supervised model to predict known_issue_type for labeled cases
Build an unsupervised model to group all support cases and identify emerging issue patterns
Compare the two approaches, including required labels, objective, outputs, and evaluation methods
Describe how you would deploy both outputs in a Resideo support triage pipeline
Provide code, metrics, and a short explanation of tradeoffs

Business Context

Dataset

You are given historical support-case data exported from Resideo customer support systems.

Feature Group	Count	Examples
Structured case metadata	10	product_line, device_model, firmware_version, app_platform, region
Device telemetry aggregates	12	reconnect_count_24h, battery_level, signal_strength, temp_delta, sensor_fault_count
Customer/account context	6	install_age_days, homeowner_vs_pro, warranty_status, prior_case_count
Text-derived features	20	TF-IDF or embedding features from case subject and notes
Labels	1	known_issue_type for a subset of cases

Size: 82K support cases over 18 months, 48 usable features after preprocessing
Target: known_issue_type with 6 classes for labeled cases only
Label coverage: 61K labeled cases, 21K unlabeled cases
Missing data: ~9% missing telemetry fields, ~14% missing firmware_version, sparse text fields in ~7% of rows

Success Criteria

A strong solution should:

Train a supervised model on labeled data that achieves macro F1 >= 0.72 on a held-out test set
Produce unsupervised clusters on all cases with silhouette score >= 0.20 and a clear interpretation of cluster themes
Explain, in practical terms, the difference between supervised and unsupervised learning and when each should be used at Resideo

Constraints

Predictions will be used in a support workflow, so results must be explainable enough for operations teams
Batch scoring must finish in under 10 minutes for ~10K daily new cases
The approach should tolerate mixed structured, categorical, and text-derived features

Deliverables

Build a supervised model to predict known_issue_type for labeled cases
Build an unsupervised model to group all support cases and identify emerging issue patterns
Compare the two approaches, including required labels, objective, outputs, and evaluation methods
Describe how you would deploy both outputs in a Resideo support triage pipeline
Provide code, metrics, and a short explanation of tradeoffs

Business Context

Dataset

You are given historical support-case data exported from Resideo customer support systems.

Feature Group	Count	Examples
Structured case metadata	10	product_line, device_model, firmware_version, app_platform, region
Device telemetry aggregates	12	reconnect_count_24h, battery_level, signal_strength, temp_delta, sensor_fault_count
Customer/account context	6	install_age_days, homeowner_vs_pro, warranty_status, prior_case_count
Text-derived features	20	TF-IDF or embedding features from case subject and notes
Labels	1	known_issue_type for a subset of cases

Size: 82K support cases over 18 months, 48 usable features after preprocessing
Target: known_issue_type with 6 classes for labeled cases only
Label coverage: 61K labeled cases, 21K unlabeled cases
Missing data: ~9% missing telemetry fields, ~14% missing firmware_version, sparse text fields in ~7% of rows

Success Criteria

A strong solution should:

Train a supervised model on labeled data that achieves macro F1 >= 0.72 on a held-out test set
Produce unsupervised clusters on all cases with silhouette score >= 0.20 and a clear interpretation of cluster themes
Explain, in practical terms, the difference between supervised and unsupervised learning and when each should be used at Resideo

Constraints

Predictions will be used in a support workflow, so results must be explainable enough for operations teams
Batch scoring must finish in under 10 minutes for ~10K daily new cases
The approach should tolerate mixed structured, categorical, and text-derived features

Deliverables

Build a supervised model to predict known_issue_type for labeled cases
Build an unsupervised model to group all support cases and identify emerging issue patterns
Compare the two approaches, including required labels, objective, outputs, and evaluation methods
Describe how you would deploy both outputs in a Resideo support triage pipeline
Provide code, metrics, and a short explanation of tradeoffs

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Resideo Device Support Issues

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify Resideo Device Support Issues

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Resideo Device Support Issues

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer