Networking and System Test Fundamentals
System test at Arista centers on validating feature correctness, interoperability, and failover in end-to-end topologies. Interviewers will assess how you translate requirements into testable assertions, build realistic environments, and expose subtle issues at L2/L3 and overlays.
Be ready to go over:
- L2/L3 Protocols: Spanning Tree variants, VLANs, OSPF, IS-IS, BGP, ECMP, and convergence behavior under churn.
- Overlays and Services: VXLAN/EVPN, MPLS, VRFs, route leaking, multicast (IGMP, PIM SM/SSM).
- High Availability and Resiliency: Graceful restart, BFD, control/data-plane failures, TOR/leaf-spine behavior under link/node loss.
- Advanced concepts (less common): EVPN multi-homing, QoS pipeline mapping, hashing collisions, MTU fragmentation, asymmetric routing.
Example questions or scenarios:
- “Design a test to validate EVPN multi-homing stability during rapid link flaps at scale.”
- “You see intermittent packet loss with ECMP. How do you isolate hashing vs. microburst vs. buffer tuning?”
- “Walk through verifying BGP convergence targets under prefix churn and route reflector failures.”
Test Design, Automation, and Tooling
Arista expects automation where it yields leverage and exploratory testing where creativity finds edge cases. You’ll discuss how you scale test execution, capture ground truth, and integrate with CI.
Be ready to go over:
- Automation: Python (pytest/unittest), TCL, data parsing, traffic generator APIs, and log/counter harvesting.
- Traffic and Measurement Tools: IXIA/Spirent, tcpreplay, iperf, pcap tooling; gNMI/NETCONF/REST/gRPC for control-plane checks.
- Lab and Infra: Topology orchestration, containerized labs, Jenkins, artifacting of results, reproducibility practices.
- Advanced concepts (less common): Ansible/Puppet/Chef for provisioning, test data design, property-based testing, fault injection.
Example questions or scenarios:
- “Sketch a Python-based harness to validate VXLAN routes across N leafs with idempotent checks.”
- “Choose between IXIA and iperf for a throughput test with precise latency SLAs—justify your call.”
- “How would you structure logs and metrics to triage a failure postmortem quickly?”
Scale, Performance, and Observability
Quality at Arista includes p99 latency, throughput ceilings, state scale, and failure amplification. You’ll be asked to design tests that surface bottlenecks and quantify risk.
Be ready to go over:
- Scale Planning: Route counts, MAC/ARP tables, VXLAN VNI proliferation, multicast groups, buffer sizing.
- Performance Tuning: Queueing/QoS profiles, microburst visibility, traffic mixes, hardware offload considerations.
- Observability: Streaming telemetry, time-series analysis, log correlation, SLOs/SLIs, anomaly detection.
- Advanced concepts (less common): Backpressure propagation, PPS vs. bandwidth trade-offs, GC pauses in control-plane agents, long-tail analysis.
Example questions or scenarios:
- “Design a performance test for CloudVision’s state streaming under 100k interfaces with changing counters.”
- “Where do you place taps or captures to prove a QoS drop policy is (or isn’t) enforced?”
- “You meet throughput but miss latency SLOs at p99.9. What do you test next?”
Troubleshooting, Debugging, and Customer Mindset
Arista values engineers who can reproduce, isolate, and drive root cause while communicating crisply. You’ll be evaluated on how you debug with incomplete data and how you write bugs that accelerate fixes.
Be ready to go over:
- Reproduction Strategy: From symptom to minimal repro, hypothesis-driven experiments, control/data-plane isolation.
- Bug Reporting: Impact articulation, evidence packaging (pcaps/logs/counters), prioritization, and risk callouts.
- Cross-Functional Collaboration: Partnering with dev, TAC, sales engineering; communicating status and next steps.
- Advanced concepts (less common): Kernel networking anomalies, race conditions in distributed agents, file-descriptor leaks, optics-induced packet integrity issues.
Example questions or scenarios:
- “A customer reports intermittent drops ‘only during backups.’ What is your triage plan and what data will you request?”
- “Show how you’d minimize a flaky test into a deterministic reproducer.”
- “Draft a one-paragraph bug report that would get a developer’s immediate attention.”
Hardware/Optics and Manufacturing (Role-Track Specific)
If you’re pursuing Optics ODVT or Manufacturing Test, expect deeper focus on hardware interfaces, measurement, and yield.
Be ready to go over:
- Optics ODVT (50G–1.6T, PAM4): Eye diagrams, BER targets, BERTs, oscilloscopes, power meters, coherent optics basics, LPO/LRO, FEC interactions.
- Manufacturing Test (SDET): Python/Go test scripts, DFx (DFT/DFR/DFM/DFQ), web interfaces (Django), databases (MariaDB/SQL), yield analysis, failure triage.
- Advanced concepts (less common): Equalization tuning, dispersion/OSNR limits, thermal behavior under load, capacity planning for test assets.
Example questions or scenarios:
- “Outline an ODVT plan for a new 800G LPO module validating interoperability across platforms.”
- “Design a manufacturing final test that reduces false fails and improves throughput—what metrics will you track?”
- “Given a marginal BER under stress, how do you separate module vs. platform signal integrity?”