Patch Prioritization Risk Score KPI

Business Context

You’re the analytics lead embedded in the Security Engineering org at Stripe’s cloud infrastructure (tens of thousands of hosts and containers across AWS/GCP, supporting millions of merchant transactions/day). The company is preparing for a PCI DSS audit and has a board-level mandate to reduce “material security risk” after a recent industry-wide wave of ransomware exploiting known vulnerabilities.

Security leadership believes the patching program is too SLA-driven (e.g., “patch all Critical CVEs in 7 days”) and not risk-driven. Engineering teams complain that the current approach generates noisy queues and causes downtime for low-value systems, while genuinely risky exposures (internet-facing, exploitable, high-value assets) sometimes wait.

You are asked to design a metrics framework that prioritizes patches based on asset criticality and threat context, and to propose how to use the metric to drive weekly patching decisions.

Metric Scenario

Over the last 6 weeks:

The org met its stated SLA: 92% of Critical CVEs patched within 7 days.
Yet the internal red team demonstrated a path to compromise using a Medium CVE on an internet-facing service with public exploit code.
Incident Response also reports an increase in blocked exploit attempts on edge services.

The CISO wants a single KPI that answers: “Are we reducing real risk with the patch capacity we have?” The Head of Infrastructure wants a prioritization score that can be turned into an ordered backlog and measured week-over-week.

Available Data

Data Source	Description	Granularity
vuln_findings	Scanner findings with CVE, CVSS, detection time, affected package/version	Per asset-CVE
asset_inventory	Service ownership, environment, tier, data sensitivity, exposure	Per asset
threat_intel	Exploit maturity, EPSS, KEV flag, in-the-wild signals	Per CVE-day
patch_events	Patch deployed timestamps, method (reboot/live), change window	Per asset-CVE
security_events	WAF/IDS alerts, exploit attempt signatures, incident tickets	Per asset-day

What You Need to Produce

Define a North-Star KPI for risk-based patching that combines:
- asset criticality (business impact)
- threat context (likelihood / active exploitation)
- exposure (internet-facing, reachable paths)
- time (risk increases with age)
Provide a calculation approach that is implementable in a data warehouse and can be computed daily.
Decompose the KPI into actionable drivers so engineering teams can see why they are being asked to patch something.
Propose benchmarks/targets for the KPI (what “good” looks like) and how you would set them using baseline data.
Recommend an operating cadence: how the metric drives weekly patch prioritization, including guardrails (e.g., downtime, change failure rate).
Diagnose a counterintuitive outcome: how could the org improve “% Critical CVEs patched in 7 days” while the risk-based KPI worsens? List 3 plausible causes and the analyses you’d run.

Constraints:

Patch capacity is limited to ~2,000 asset-CVE remediations/week.
Some assets have strict maintenance windows (payments core).
You must avoid incentives that cause teams to game the metric (e.g., reclassifying assets, suppressing findings).

Business Context

Metric Scenario

Over the last 6 weeks:

The org met its stated SLA: 92% of Critical CVEs patched within 7 days.
Yet the internal red team demonstrated a path to compromise using a Medium CVE on an internet-facing service with public exploit code.
Incident Response also reports an increase in blocked exploit attempts on edge services.

Available Data

Data Source	Description	Granularity
vuln_findings	Scanner findings with CVE, CVSS, detection time, affected package/version	Per asset-CVE
asset_inventory	Service ownership, environment, tier, data sensitivity, exposure	Per asset
threat_intel	Exploit maturity, EPSS, KEV flag, in-the-wild signals	Per CVE-day
patch_events	Patch deployed timestamps, method (reboot/live), change window	Per asset-CVE
security_events	WAF/IDS alerts, exploit attempt signatures, incident tickets	Per asset-day

What You Need to Produce

Define a North-Star KPI for risk-based patching that combines:
- asset criticality (business impact)
- threat context (likelihood / active exploitation)
- exposure (internet-facing, reachable paths)
- time (risk increases with age)
Provide a calculation approach that is implementable in a data warehouse and can be computed daily.
Decompose the KPI into actionable drivers so engineering teams can see why they are being asked to patch something.
Propose benchmarks/targets for the KPI (what “good” looks like) and how you would set them using baseline data.
Recommend an operating cadence: how the metric drives weekly patch prioritization, including guardrails (e.g., downtime, change failure rate).
Diagnose a counterintuitive outcome: how could the org improve “% Critical CVEs patched in 7 days” while the risk-based KPI worsens? List 3 plausible causes and the analyses you’d run.

Constraints:

Patch capacity is limited to ~2,000 asset-CVE remediations/week.
Some assets have strict maintenance windows (payments core).
You must avoid incentives that cause teams to game the metric (e.g., reclassifying assets, suppressing findings).

Business Context

Metric Scenario

Over the last 6 weeks:

The org met its stated SLA: 92% of Critical CVEs patched within 7 days.
Yet the internal red team demonstrated a path to compromise using a Medium CVE on an internet-facing service with public exploit code.
Incident Response also reports an increase in blocked exploit attempts on edge services.

Available Data

Data Source	Description	Granularity
vuln_findings	Scanner findings with CVE, CVSS, detection time, affected package/version	Per asset-CVE
asset_inventory	Service ownership, environment, tier, data sensitivity, exposure	Per asset
threat_intel	Exploit maturity, EPSS, KEV flag, in-the-wild signals	Per CVE-day
patch_events	Patch deployed timestamps, method (reboot/live), change window	Per asset-CVE
security_events	WAF/IDS alerts, exploit attempt signatures, incident tickets	Per asset-day

What You Need to Produce

Define a North-Star KPI for risk-based patching that combines:
- asset criticality (business impact)
- threat context (likelihood / active exploitation)
- exposure (internet-facing, reachable paths)
- time (risk increases with age)
Provide a calculation approach that is implementable in a data warehouse and can be computed daily.
Decompose the KPI into actionable drivers so engineering teams can see why they are being asked to patch something.
Propose benchmarks/targets for the KPI (what “good” looks like) and how you would set them using baseline data.
Recommend an operating cadence: how the metric drives weekly patch prioritization, including guardrails (e.g., downtime, change failure rate).
Diagnose a counterintuitive outcome: how could the org improve “% Critical CVEs patched in 7 days” while the risk-based KPI worsens? List 3 plausible causes and the analyses you’d run.

Constraints:

Patch capacity is limited to ~2,000 asset-CVE remediations/week.
Some assets have strict maintenance windows (payments core).
You must avoid incentives that cause teams to game the metric (e.g., reclassifying assets, suppressing findings).

Business Context

Metric Scenario

Over the last 6 weeks:

The org met its stated SLA: 92% of Critical CVEs patched within 7 days.
Yet the internal red team demonstrated a path to compromise using a Medium CVE on an internet-facing service with public exploit code.
Incident Response also reports an increase in blocked exploit attempts on edge services.

Available Data

Data Source	Description	Granularity
vuln_findings	Scanner findings with CVE, CVSS, detection time, affected package/version	Per asset-CVE
asset_inventory	Service ownership, environment, tier, data sensitivity, exposure	Per asset
threat_intel	Exploit maturity, EPSS, KEV flag, in-the-wild signals	Per CVE-day
patch_events	Patch deployed timestamps, method (reboot/live), change window	Per asset-CVE
security_events	WAF/IDS alerts, exploit attempt signatures, incident tickets	Per asset-day

What You Need to Produce

Define a North-Star KPI for risk-based patching that combines:
- asset criticality (business impact)
- threat context (likelihood / active exploitation)
- exposure (internet-facing, reachable paths)
- time (risk increases with age)
Provide a calculation approach that is implementable in a data warehouse and can be computed daily.
Decompose the KPI into actionable drivers so engineering teams can see why they are being asked to patch something.
Propose benchmarks/targets for the KPI (what “good” looks like) and how you would set them using baseline data.
Recommend an operating cadence: how the metric drives weekly patch prioritization, including guardrails (e.g., downtime, change failure rate).
Diagnose a counterintuitive outcome: how could the org improve “% Critical CVEs patched in 7 days” while the risk-based KPI worsens? List 3 plausible causes and the analyses you’d run.

Constraints:

Patch capacity is limited to ~2,000 asset-CVE remediations/week.
Some assets have strict maintenance windows (payments core).
You must avoid incentives that cause teams to game the metric (e.g., reclassifying assets, suppressing findings).

Interview Guides

Business Context

Metric Scenario

Available Data

What You Need to Produce

Patch Prioritization Risk Score KPI

Business Context

Metric Scenario

Available Data

What You Need to Produce

Your Answer

Patch Prioritization Risk Score KPI

Business Context

Metric Scenario

Available Data

What You Need to Produce

Patch Prioritization Risk Score KPI

Business Context

Metric Scenario

Available Data

What You Need to Produce

Your Answer