NetSecure, a network monitoring company processing millions of connection logs per day, wants a lightweight classifier that labels IPv4 addresses as private or public before downstream traffic analysis. The goal is not to hand-code rules only, but to evaluate whether a production ML pipeline can learn the distinction reliably from parsed IP features and remain easy to deploy.
You are given a labeled IPv4 dataset built from historical network logs and enrichment jobs.
| Feature Group | Count | Examples |
|---|---|---|
| Parsed octets | 4 | octet1, octet2, octet3, octet4 |
| Binary/range flags | 6 | is_10_range, is_172_range, is_192_range, first_octet_lt_128 |
| Numeric transforms | 5 | ip_as_int, normalized_octet_mean, normalized_octet_std |
| Context metadata | 3 | source_region, device_type, log_source |
A good solution should achieve F1 >= 0.98, precision >= 0.99 on the private class, and near-perfect recall on standard RFC1918 private ranges. The model should also expose feature importance or simple decision logic for auditability.