
You are responding to a severe production outage affecting a critical customer-facing system. Service is degraded, leadership wants frequent updates, and multiple teams are involved in diagnosis and recovery. After the incident is stabilized, you need to explain how you approached root cause analysis without jumping to conclusions.
Tell me about a time you had to troubleshoot a severe production outage. What was your RCA process?