Project Context
Northstar Health operates a scheduling and billing platform used by 180 outpatient clinics. Over the last two quarters, the engineering team has handled maintenance work reactively: urgent bugs, aging infrastructure patches, and overdue dependency upgrades are interrupting roadmap delivery and causing repeated incidents. You are the program manager assigned to create a disciplined maintenance planning process within 12 weeks.
The core team includes 1 engineering manager, 6 engineers, 1 QA lead, 1 product manager, and shared support from SRE and security. The CTO wants fewer production incidents, while the Head of Product does not want feature delivery to slow down before the company’s annual customer conference in 3 months.
Key Stakeholders
- CTO wants predictable maintenance planning, lower incident volume, and visible risk tracking.
- Head of Product wants at least 85% of committed feature roadmap work delivered this quarter.
- Security Lead wants all critical vulnerabilities remediated within SLA.
- Customer Support Director wants fewer escalations from recurring defects.
- Engineering Manager wants a process the team can sustain without burnout.
Constraints
- Timeline: 12 weeks to design and operationalize the process
- Budget: $90,000 for tooling, contractor support, and training
- Team capacity: 6 engineers, but only 2 can spend more than 30% of time on maintenance process setup
- Current backlog: 146 maintenance items, including 11 critical security patches and 23 Sev-2 defects
- Dependency: Security team is only available for 4 hours per week; SRE can support one maintenance window per month
Complications
- A major customer renewal is at risk if two recurring billing defects are not fixed within 5 weeks.
- Two senior engineers disagree on whether to reserve fixed maintenance capacity or prioritize dynamically each sprint.
- The company has no agreed scoring model for maintenance work, so teams debate severity and urgency case by case.
Deliverables
- Create a 12-week execution plan to move maintenance from reactive to planned.
- Define an intake, prioritization, and review process for maintenance work.
- Propose how much engineering capacity to reserve for maintenance and how to adjust it over time.
- Identify key risks, stakeholder trade-offs, and escalation paths.
- Define success metrics and a reporting cadence for the first quarter after rollout.