Classify Public vs Private IPs

Business Context

NetSecure, a network monitoring company processing millions of connection logs per day, wants a lightweight classifier that labels IPv4 addresses as private or public before downstream traffic analysis. The goal is not to hand-code rules only, but to evaluate whether a production ML pipeline can learn the distinction reliably from parsed IP features and remain easy to deploy.

Dataset

You are given a labeled IPv4 dataset built from historical network logs and enrichment jobs.

Feature Group	Count	Examples
Parsed octets	4	octet1, octet2, octet3, octet4
Binary/range flags	6	is_10_range, is_172_range, is_192_range, first_octet_lt_128
Numeric transforms	5	ip_as_int, normalized_octet_mean, normalized_octet_std
Context metadata	3	source_region, device_type, log_source

Size: 240K IPv4 records, 18 features
Target: Binary label — private (1) vs public (0)
Class balance: 28% private, 72% public
Missing data: <2% missing in metadata fields; no missing octets after parsing validation

Success Criteria

A good solution should achieve F1 >= 0.98, precision >= 0.99 on the private class, and near-perfect recall on standard RFC1918 private ranges. The model should also expose feature importance or simple decision logic for auditability.

Constraints

Inference must score at least 50K IPs/sec in batch jobs
The solution must be interpretable enough for network analysts
Retraining should be simple and inexpensive
Avoid leakage from labels encoded directly in engineered features

Deliverables

Build a binary classification pipeline to predict whether an IPv4 address is private or public.
Explain feature engineering from raw dotted-quad strings to model-ready inputs.
Compare at least one interpretable baseline with one tree-based model.
Evaluate using class-specific precision, recall, F1, and confusion matrix.
Describe how you would deploy and monitor the model in a log-processing pipeline.

Business Context

Dataset

You are given a labeled IPv4 dataset built from historical network logs and enrichment jobs.

Feature Group	Count	Examples
Parsed octets	4	octet1, octet2, octet3, octet4
Binary/range flags	6	is_10_range, is_172_range, is_192_range, first_octet_lt_128
Numeric transforms	5	ip_as_int, normalized_octet_mean, normalized_octet_std
Context metadata	3	source_region, device_type, log_source

Size: 240K IPv4 records, 18 features
Target: Binary label — private (1) vs public (0)
Class balance: 28% private, 72% public
Missing data: <2% missing in metadata fields; no missing octets after parsing validation

Success Criteria

Constraints

Inference must score at least 50K IPs/sec in batch jobs
The solution must be interpretable enough for network analysts
Retraining should be simple and inexpensive
Avoid leakage from labels encoded directly in engineered features

Deliverables

Build a binary classification pipeline to predict whether an IPv4 address is private or public.
Explain feature engineering from raw dotted-quad strings to model-ready inputs.
Compare at least one interpretable baseline with one tree-based model.
Evaluate using class-specific precision, recall, F1, and confusion matrix.
Describe how you would deploy and monitor the model in a log-processing pipeline.

Business Context

Dataset

You are given a labeled IPv4 dataset built from historical network logs and enrichment jobs.

Feature Group	Count	Examples
Parsed octets	4	octet1, octet2, octet3, octet4
Binary/range flags	6	is_10_range, is_172_range, is_192_range, first_octet_lt_128
Numeric transforms	5	ip_as_int, normalized_octet_mean, normalized_octet_std
Context metadata	3	source_region, device_type, log_source

Size: 240K IPv4 records, 18 features
Target: Binary label — private (1) vs public (0)
Class balance: 28% private, 72% public
Missing data: <2% missing in metadata fields; no missing octets after parsing validation

Success Criteria

Constraints

Inference must score at least 50K IPs/sec in batch jobs
The solution must be interpretable enough for network analysts
Retraining should be simple and inexpensive
Avoid leakage from labels encoded directly in engineered features

Deliverables

Build a binary classification pipeline to predict whether an IPv4 address is private or public.
Explain feature engineering from raw dotted-quad strings to model-ready inputs.
Compare at least one interpretable baseline with one tree-based model.
Evaluate using class-specific precision, recall, F1, and confusion matrix.
Describe how you would deploy and monitor the model in a log-processing pipeline.

Business Context

Dataset

You are given a labeled IPv4 dataset built from historical network logs and enrichment jobs.

Feature Group	Count	Examples
Parsed octets	4	octet1, octet2, octet3, octet4
Binary/range flags	6	is_10_range, is_172_range, is_192_range, first_octet_lt_128
Numeric transforms	5	ip_as_int, normalized_octet_mean, normalized_octet_std
Context metadata	3	source_region, device_type, log_source

Size: 240K IPv4 records, 18 features
Target: Binary label — private (1) vs public (0)
Class balance: 28% private, 72% public
Missing data: <2% missing in metadata fields; no missing octets after parsing validation

Success Criteria

Constraints

Inference must score at least 50K IPs/sec in batch jobs
The solution must be interpretable enough for network analysts
Retraining should be simple and inexpensive
Avoid leakage from labels encoded directly in engineered features

Deliverables

Build a binary classification pipeline to predict whether an IPv4 address is private or public.
Explain feature engineering from raw dotted-quad strings to model-ready inputs.
Compare at least one interpretable baseline with one tree-based model.
Evaluate using class-specific precision, recall, F1, and confusion matrix.
Describe how you would deploy and monitor the model in a log-processing pipeline.

Interview Guides

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Public vs Private IPs

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer

Classify Public vs Private IPs

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Classify Public vs Private IPs

Business Context

Dataset

Success Criteria

Constraints

Deliverables

Your Answer