Build BGP Route Telemetry Pipeline

Context

NetVista, a network observability company, collects BGP update messages from routers and public collectors to power near-real-time routing analytics for enterprise customers. Today, route data is ingested as hourly flat files and processed in batch, which makes it difficult to detect route leaks, prefix hijacks, and flapping events quickly.

You need to design a data pipeline that ingests BGP protocol telemetry, validates and enriches it, and serves both real-time operational dashboards and historical analytics.

Scale Requirements

Sources: 2,500 routers and 40 external BGP collectors
Peak throughput: 180K BGP UPDATE messages/sec, 25K withdrawals/sec
Message size: 0.8-2.5 KB per event after normalization
Latency target: < 30 seconds from receipt to queryable analytics tables
Daily volume: ~12 TB raw JSON/Avro, 4 TB compressed Parquet
Retention: 30 days raw, 13 months aggregated route metrics

Requirements

Ingest BGP announcements, withdrawals, and session-state events from multiple regions with ordered processing per peer.
Normalize protocol fields such as peer_asn, prefix, next_hop, as_path, med, local_pref, and community into a canonical schema.
Detect duplicates, malformed prefixes, invalid ASN values, and out-of-order events.
Produce real-time derived datasets for route changes, prefix reachability, peer instability, and AS-path changes.
Support replay/backfill for a missed collector window without duplicating downstream records.
Expose curated tables for analysts in a warehouse and low-latency aggregates for operational dashboards.
Define monitoring, alerting, and failure recovery for ingestion, processing, and warehouse loads.

Constraints

Primary cloud is AWS; existing platform uses Amazon MSK, S3, Airflow, and Snowflake.
Team size is 3 data engineers and 1 SRE; operational complexity should stay moderate.
Budget cap is $35K/month incremental spend.
Route telemetry may contain customer-identifiable IP allocations, so access controls and audit logging are required.

Context

You need to design a data pipeline that ingests BGP protocol telemetry, validates and enriches it, and serves both real-time operational dashboards and historical analytics.

Scale Requirements

Sources: 2,500 routers and 40 external BGP collectors
Peak throughput: 180K BGP UPDATE messages/sec, 25K withdrawals/sec
Message size: 0.8-2.5 KB per event after normalization
Latency target: < 30 seconds from receipt to queryable analytics tables
Daily volume: ~12 TB raw JSON/Avro, 4 TB compressed Parquet
Retention: 30 days raw, 13 months aggregated route metrics

Requirements

Ingest BGP announcements, withdrawals, and session-state events from multiple regions with ordered processing per peer.
Normalize protocol fields such as peer_asn, prefix, next_hop, as_path, med, local_pref, and community into a canonical schema.
Detect duplicates, malformed prefixes, invalid ASN values, and out-of-order events.
Produce real-time derived datasets for route changes, prefix reachability, peer instability, and AS-path changes.
Support replay/backfill for a missed collector window without duplicating downstream records.
Expose curated tables for analysts in a warehouse and low-latency aggregates for operational dashboards.
Define monitoring, alerting, and failure recovery for ingestion, processing, and warehouse loads.

Constraints

Primary cloud is AWS; existing platform uses Amazon MSK, S3, Airflow, and Snowflake.
Team size is 3 data engineers and 1 SRE; operational complexity should stay moderate.
Budget cap is $35K/month incremental spend.
Route telemetry may contain customer-identifiable IP allocations, so access controls and audit logging are required.

Context

You need to design a data pipeline that ingests BGP protocol telemetry, validates and enriches it, and serves both real-time operational dashboards and historical analytics.

Scale Requirements

Sources: 2,500 routers and 40 external BGP collectors
Peak throughput: 180K BGP UPDATE messages/sec, 25K withdrawals/sec
Message size: 0.8-2.5 KB per event after normalization
Latency target: < 30 seconds from receipt to queryable analytics tables
Daily volume: ~12 TB raw JSON/Avro, 4 TB compressed Parquet
Retention: 30 days raw, 13 months aggregated route metrics

Requirements

Ingest BGP announcements, withdrawals, and session-state events from multiple regions with ordered processing per peer.
Normalize protocol fields such as peer_asn, prefix, next_hop, as_path, med, local_pref, and community into a canonical schema.
Detect duplicates, malformed prefixes, invalid ASN values, and out-of-order events.
Produce real-time derived datasets for route changes, prefix reachability, peer instability, and AS-path changes.
Support replay/backfill for a missed collector window without duplicating downstream records.
Expose curated tables for analysts in a warehouse and low-latency aggregates for operational dashboards.
Define monitoring, alerting, and failure recovery for ingestion, processing, and warehouse loads.

Constraints

Primary cloud is AWS; existing platform uses Amazon MSK, S3, Airflow, and Snowflake.
Team size is 3 data engineers and 1 SRE; operational complexity should stay moderate.
Budget cap is $35K/month incremental spend.
Route telemetry may contain customer-identifiable IP allocations, so access controls and audit logging are required.

Context

You need to design a data pipeline that ingests BGP protocol telemetry, validates and enriches it, and serves both real-time operational dashboards and historical analytics.

Scale Requirements

Sources: 2,500 routers and 40 external BGP collectors
Peak throughput: 180K BGP UPDATE messages/sec, 25K withdrawals/sec
Message size: 0.8-2.5 KB per event after normalization
Latency target: < 30 seconds from receipt to queryable analytics tables
Daily volume: ~12 TB raw JSON/Avro, 4 TB compressed Parquet
Retention: 30 days raw, 13 months aggregated route metrics

Requirements

Ingest BGP announcements, withdrawals, and session-state events from multiple regions with ordered processing per peer.
Normalize protocol fields such as peer_asn, prefix, next_hop, as_path, med, local_pref, and community into a canonical schema.
Detect duplicates, malformed prefixes, invalid ASN values, and out-of-order events.
Produce real-time derived datasets for route changes, prefix reachability, peer instability, and AS-path changes.
Support replay/backfill for a missed collector window without duplicating downstream records.
Expose curated tables for analysts in a warehouse and low-latency aggregates for operational dashboards.
Define monitoring, alerting, and failure recovery for ingestion, processing, and warehouse loads.

Constraints

Primary cloud is AWS; existing platform uses Amazon MSK, S3, Airflow, and Snowflake.
Team size is 3 data engineers and 1 SRE; operational complexity should stay moderate.
Budget cap is $35K/month incremental spend.
Route telemetry may contain customer-identifiable IP allocations, so access controls and audit logging are required.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Build BGP Route Telemetry Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer

Build BGP Route Telemetry Pipeline

Context

Scale Requirements

Requirements

Constraints

Build BGP Route Telemetry Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer