Evaluate Address Match Classifier

Context

Precisely is testing a binary classification model in Precisely Data Integrity Suite to predict whether an incoming customer address record is a valid match to a trusted master record. The model is used before downstream enrichment and deduplication workflows. After deployment in a pilot, operations reported that too many true matches were being missed, creating manual review work and delaying record onboarding.

Current Performance

Metric	Validation Set	Pilot Production Week
Accuracy	0.91	0.89
Precision	0.88	0.90
Recall	0.64	0.58
F1 Score	0.74	0.70
AUC-ROC	0.86	0.84
Positive Rate	32%	29%

On the pilot production week, the confusion matrix on 10,000 labeled decisions was:

	Predicted Match	Predicted Non-Match
Actual Match	1,682	1,218
Actual Non-Match	186	6,914

The Problem

The team wants to know whether this is actually a good classifier, despite high accuracy and precision. The concern is that missed matches are more expensive than extra reviews, because unmatched records bypass downstream entity resolution and reduce data quality for customers.

Requirements

Interpret the model’s performance using the metrics above, not accuracy alone.
Explain what the confusion matrix says about business impact.
Identify the main weakness of the current model.
Recommend how you would validate the model before broader rollout.
Propose 3-4 practical improvements, including any threshold or calibration changes.

Constraints

Manual review capacity is limited to 2,200 records/day.
A false negative costs about $4.50 in downstream remediation.
A false positive costs about $0.60 in unnecessary review time.

Context

Current Performance

Metric	Validation Set	Pilot Production Week
Accuracy	0.91	0.89
Precision	0.88	0.90
Recall	0.64	0.58
F1 Score	0.74	0.70
AUC-ROC	0.86	0.84
Positive Rate	32%	29%

On the pilot production week, the confusion matrix on 10,000 labeled decisions was:

	Predicted Match	Predicted Non-Match
Actual Match	1,682	1,218
Actual Non-Match	186	6,914

The Problem

Requirements

Interpret the model’s performance using the metrics above, not accuracy alone.
Explain what the confusion matrix says about business impact.
Identify the main weakness of the current model.
Recommend how you would validate the model before broader rollout.
Propose 3-4 practical improvements, including any threshold or calibration changes.

Constraints

Manual review capacity is limited to 2,200 records/day.
A false negative costs about $4.50 in downstream remediation.
A false positive costs about $0.60 in unnecessary review time.

Context

Current Performance

Metric	Validation Set	Pilot Production Week
Accuracy	0.91	0.89
Precision	0.88	0.90
Recall	0.64	0.58
F1 Score	0.74	0.70
AUC-ROC	0.86	0.84
Positive Rate	32%	29%

On the pilot production week, the confusion matrix on 10,000 labeled decisions was:

	Predicted Match	Predicted Non-Match
Actual Match	1,682	1,218
Actual Non-Match	186	6,914

The Problem

Requirements

Interpret the model’s performance using the metrics above, not accuracy alone.
Explain what the confusion matrix says about business impact.
Identify the main weakness of the current model.
Recommend how you would validate the model before broader rollout.
Propose 3-4 practical improvements, including any threshold or calibration changes.

Constraints

Manual review capacity is limited to 2,200 records/day.
A false negative costs about $4.50 in downstream remediation.
A false positive costs about $0.60 in unnecessary review time.

Context

Current Performance

Metric	Validation Set	Pilot Production Week
Accuracy	0.91	0.89
Precision	0.88	0.90
Recall	0.64	0.58
F1 Score	0.74	0.70
AUC-ROC	0.86	0.84
Positive Rate	32%	29%

On the pilot production week, the confusion matrix on 10,000 labeled decisions was:

	Predicted Match	Predicted Non-Match
Actual Match	1,682	1,218
Actual Non-Match	186	6,914

The Problem

Requirements

Interpret the model’s performance using the metrics above, not accuracy alone.
Explain what the confusion matrix says about business impact.
Identify the main weakness of the current model.
Recommend how you would validate the model before broader rollout.
Propose 3-4 practical improvements, including any threshold or calibration changes.

Constraints

Manual review capacity is limited to 2,200 records/day.
A false negative costs about $4.50 in downstream remediation.
A false positive costs about $0.60 in unnecessary review time.

Interview Guides

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Address Match Classifier

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer

Evaluate Address Match Classifier

Context

Current Performance

The Problem

Requirements

Constraints

Evaluate Address Match Classifier

Context

Current Performance

The Problem

Requirements

Constraints

Your Answer