Define Engineering Operations Health Metrics

Business Context

Nimbus builds a B2B workflow platform with 120 engineers across 10 squads. Over the last two quarters, customer-reported incidents rose from 14 to 23 per quarter, while roadmap delivery slipped from 86% to 71% of committed work, and engineering leadership wants a clear KPI framework for operational health.

Metric Scenario

The CTO says teams are tracking too many disconnected metrics: deployment frequency ranges from 3 to 18 deploys per week by squad, median PR cycle time increased from 18 hours to 31 hours, change failure rate rose from 9% to 15%, and mean time to restore (MTTR) increased from 52 minutes to 95 minutes. At the same time, voluntary engineer attrition is still low at 6% annually, and quarterly engagement survey scores are stable at 7.8/10, creating confusion about whether engineering operations are actually healthy.

You are asked to define the small set of metrics you would rely on most heavily, explain how they fit together, and show how you would diagnose the recent deterioration.

Requirements

Define the core metrics you would use to gauge engineering operations health, including 1 primary KPI and supporting leading and lagging indicators.
Explain how each metric should be calculated and what trade-offs or blind spots it has.
Decompose the recent performance decline to identify whether the issue is speed, quality, reliability, planning accuracy, or team capacity.
Recommend which metrics should be reviewed weekly versus monthly by engineering leadership.
Propose concrete actions if the main issue is rising change failure rate and slower PR cycle time.

Data Available

deployments: deployment_id, service_id, squad_id, deployed_at, status, rollback_flag
pull_requests: pr_id, repo_id, squad_id, opened_at, first_review_at, merged_at, lines_changed
incidents: incident_id, service_id, severity, started_at, resolved_at, root_cause
sprint_commitments: squad_id, sprint_id, committed_story_points, completed_story_points
eng_survey: engineer_id, quarter, engagement_score, intent_to_stay

Business Context

Metric Scenario

You are asked to define the small set of metrics you would rely on most heavily, explain how they fit together, and show how you would diagnose the recent deterioration.

Requirements

Define the core metrics you would use to gauge engineering operations health, including 1 primary KPI and supporting leading and lagging indicators.
Explain how each metric should be calculated and what trade-offs or blind spots it has.
Decompose the recent performance decline to identify whether the issue is speed, quality, reliability, planning accuracy, or team capacity.
Recommend which metrics should be reviewed weekly versus monthly by engineering leadership.
Propose concrete actions if the main issue is rising change failure rate and slower PR cycle time.

Data Available

deployments: deployment_id, service_id, squad_id, deployed_at, status, rollback_flag
pull_requests: pr_id, repo_id, squad_id, opened_at, first_review_at, merged_at, lines_changed
incidents: incident_id, service_id, severity, started_at, resolved_at, root_cause
sprint_commitments: squad_id, sprint_id, committed_story_points, completed_story_points
eng_survey: engineer_id, quarter, engagement_score, intent_to_stay

Business Context

Metric Scenario

You are asked to define the small set of metrics you would rely on most heavily, explain how they fit together, and show how you would diagnose the recent deterioration.

Requirements

Define the core metrics you would use to gauge engineering operations health, including 1 primary KPI and supporting leading and lagging indicators.
Explain how each metric should be calculated and what trade-offs or blind spots it has.
Decompose the recent performance decline to identify whether the issue is speed, quality, reliability, planning accuracy, or team capacity.
Recommend which metrics should be reviewed weekly versus monthly by engineering leadership.
Propose concrete actions if the main issue is rising change failure rate and slower PR cycle time.

Data Available

deployments: deployment_id, service_id, squad_id, deployed_at, status, rollback_flag
pull_requests: pr_id, repo_id, squad_id, opened_at, first_review_at, merged_at, lines_changed
incidents: incident_id, service_id, severity, started_at, resolved_at, root_cause
sprint_commitments: squad_id, sprint_id, committed_story_points, completed_story_points
eng_survey: engineer_id, quarter, engagement_score, intent_to_stay

Business Context

Metric Scenario

You are asked to define the small set of metrics you would rely on most heavily, explain how they fit together, and show how you would diagnose the recent deterioration.

Requirements

Define the core metrics you would use to gauge engineering operations health, including 1 primary KPI and supporting leading and lagging indicators.
Explain how each metric should be calculated and what trade-offs or blind spots it has.
Decompose the recent performance decline to identify whether the issue is speed, quality, reliability, planning accuracy, or team capacity.
Recommend which metrics should be reviewed weekly versus monthly by engineering leadership.
Propose concrete actions if the main issue is rising change failure rate and slower PR cycle time.

Data Available

deployments: deployment_id, service_id, squad_id, deployed_at, status, rollback_flag
pull_requests: pr_id, repo_id, squad_id, opened_at, first_review_at, merged_at, lines_changed
incidents: incident_id, service_id, severity, started_at, resolved_at, root_cause
sprint_commitments: squad_id, sprint_id, committed_story_points, completed_story_points
eng_survey: engineer_id, quarter, engagement_score, intent_to_stay

Interview Guides

Business Context

Metric Scenario

Requirements

Data Available

Define Engineering Operations Health Metrics

Business Context

Metric Scenario

Requirements

Data Available

Your Answer

Define Engineering Operations Health Metrics

Business Context

Metric Scenario

Requirements

Data Available

Define Engineering Operations Health Metrics

Business Context

Metric Scenario

Requirements

Data Available

Your Answer