Build Testable Network Config Pipeline

Context

NetOpsCloud manages configuration backups and compliance checks for 8,000 enterprise network devices across Cisco, Juniper, and Arista environments. Today, engineers run ad hoc Python scripts from laptops to pull configs over SSH, parse command output, and upload files to S3; the scripts are hard to read, difficult to test, and frequently fail during retries.

You need to design a production-grade data pipeline that turns network automation scripts into a readable, testable, and observable batch ETL system. The pipeline should collect device state, normalize vendor-specific outputs, store raw and curated records, and support safe re-runs without duplicate data.

Scale Requirements

Devices: 8,000 active devices, growing to 20,000 in 12 months
Collection frequency: Every 15 minutes for critical devices, hourly for standard devices
Payload size: 200KB-2MB raw text per device snapshot
Daily volume: ~1.5-3 TB raw command output
Latency target: Snapshot available in curated warehouse tables within 10 minutes of scheduled run
Retention: 90 days raw snapshots, 2 years normalized inventory/compliance history

Requirements

Design an orchestrated batch pipeline to collect configs and operational state from devices over SSH/API.
Make the extraction and parsing code modular, readable, and unit-testable across vendors.
Ensure idempotent re-runs for partial failures, duplicate scheduler triggers, and backfills.
Store both raw command output and normalized tables for interfaces, routes, software versions, and compliance findings.
Add automated data quality checks for missing devices, parse failures, schema drift, and stale snapshots.
Describe CI/CD, test strategy, and how you would separate pure parsing logic from side effects such as network calls and storage writes.
Provide monitoring, alerting, and recovery procedures for device timeouts, parser regressions, and warehouse load failures.

Constraints

Infrastructure must run on AWS using managed services where practical.
Security requires encrypted secrets, audit logging, and no plaintext credentials in code.
Budget allows moderate batch compute but not a 24/7 large streaming cluster.
Some devices are rate-limited and can only be queried once per collection window.

Context

Scale Requirements

Devices: 8,000 active devices, growing to 20,000 in 12 months
Collection frequency: Every 15 minutes for critical devices, hourly for standard devices
Payload size: 200KB-2MB raw text per device snapshot
Daily volume: ~1.5-3 TB raw command output
Latency target: Snapshot available in curated warehouse tables within 10 minutes of scheduled run
Retention: 90 days raw snapshots, 2 years normalized inventory/compliance history

Requirements

Design an orchestrated batch pipeline to collect configs and operational state from devices over SSH/API.
Make the extraction and parsing code modular, readable, and unit-testable across vendors.
Ensure idempotent re-runs for partial failures, duplicate scheduler triggers, and backfills.
Store both raw command output and normalized tables for interfaces, routes, software versions, and compliance findings.
Add automated data quality checks for missing devices, parse failures, schema drift, and stale snapshots.
Describe CI/CD, test strategy, and how you would separate pure parsing logic from side effects such as network calls and storage writes.
Provide monitoring, alerting, and recovery procedures for device timeouts, parser regressions, and warehouse load failures.

Constraints

Infrastructure must run on AWS using managed services where practical.
Security requires encrypted secrets, audit logging, and no plaintext credentials in code.
Budget allows moderate batch compute but not a 24/7 large streaming cluster.
Some devices are rate-limited and can only be queried once per collection window.

Context

Scale Requirements

Devices: 8,000 active devices, growing to 20,000 in 12 months
Collection frequency: Every 15 minutes for critical devices, hourly for standard devices
Payload size: 200KB-2MB raw text per device snapshot
Daily volume: ~1.5-3 TB raw command output
Latency target: Snapshot available in curated warehouse tables within 10 minutes of scheduled run
Retention: 90 days raw snapshots, 2 years normalized inventory/compliance history

Requirements

Design an orchestrated batch pipeline to collect configs and operational state from devices over SSH/API.
Make the extraction and parsing code modular, readable, and unit-testable across vendors.
Ensure idempotent re-runs for partial failures, duplicate scheduler triggers, and backfills.
Store both raw command output and normalized tables for interfaces, routes, software versions, and compliance findings.
Add automated data quality checks for missing devices, parse failures, schema drift, and stale snapshots.
Describe CI/CD, test strategy, and how you would separate pure parsing logic from side effects such as network calls and storage writes.
Provide monitoring, alerting, and recovery procedures for device timeouts, parser regressions, and warehouse load failures.

Constraints

Infrastructure must run on AWS using managed services where practical.
Security requires encrypted secrets, audit logging, and no plaintext credentials in code.
Budget allows moderate batch compute but not a 24/7 large streaming cluster.
Some devices are rate-limited and can only be queried once per collection window.

Context

Scale Requirements

Devices: 8,000 active devices, growing to 20,000 in 12 months
Collection frequency: Every 15 minutes for critical devices, hourly for standard devices
Payload size: 200KB-2MB raw text per device snapshot
Daily volume: ~1.5-3 TB raw command output
Latency target: Snapshot available in curated warehouse tables within 10 minutes of scheduled run
Retention: 90 days raw snapshots, 2 years normalized inventory/compliance history

Requirements

Design an orchestrated batch pipeline to collect configs and operational state from devices over SSH/API.
Make the extraction and parsing code modular, readable, and unit-testable across vendors.
Ensure idempotent re-runs for partial failures, duplicate scheduler triggers, and backfills.
Store both raw command output and normalized tables for interfaces, routes, software versions, and compliance findings.
Add automated data quality checks for missing devices, parse failures, schema drift, and stale snapshots.
Describe CI/CD, test strategy, and how you would separate pure parsing logic from side effects such as network calls and storage writes.
Provide monitoring, alerting, and recovery procedures for device timeouts, parser regressions, and warehouse load failures.

Constraints

Infrastructure must run on AWS using managed services where practical.
Security requires encrypted secrets, audit logging, and no plaintext credentials in code.
Budget allows moderate batch compute but not a 24/7 large streaming cluster.
Some devices are rate-limited and can only be queried once per collection window.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Build Testable Network Config Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer

Build Testable Network Config Pipeline

Context

Scale Requirements

Requirements

Constraints

Build Testable Network Config Pipeline

Context

Scale Requirements

Requirements

Constraints

Your Answer