You’re the program manager for PayNow, a fast-growing fintech that provides a “pay-by-bank” checkout option for mid-market e-commerce merchants. PayNow processes $6.5B/year in payment volume and supports 3.2M monthly active shoppers. The company is preparing to launch Instant Refunds—a feature that credits customers within minutes after a return is approved—starting with the US and UK. The business case is strong: Instant Refunds is expected to increase merchant retention by 2–3 percentage points and reduce chargeback-related support costs by $1.2M/year.
The blocker: PayNow’s engineering org has accumulated fragmented build and release practices across ~40 microservices (Kubernetes on AWS), owned by 5 teams (Payments Core, Risk, Ledger, Merchant Experience, Data Platform). Some services deploy via GitHub Actions, others via Jenkins, and a few still rely on manual steps. Release frequency varies wildly (multiple times/day vs. once every two weeks). Recent incidents include a misconfigured environment variable that caused duplicate refunds in staging and a production hotfix that bypassed tests, leading to a 45-minute checkout outage.
Leadership has mandated that Instant Refunds must ship in 10 weeks to meet merchant contract commitments and a coordinated marketing push. However, Compliance and Security are refusing to sign off on the launch unless PayNow can demonstrate a consistent CI/CD posture: repeatable builds, required test gates, artifact provenance, and a reliable rollback strategy.
These stakeholders have competing priorities: Product wants speed, Security/Compliance want rigor, Engineering wants stability, and SRE wants operational safety.
| Constraint | Details |
|---|---|
| Timeline | 10 weeks to launch Instant Refunds (hard date tied to merchant contracts) |
| Team capacity | 5 teams, ~32 engineers total, but only 6 engineers can spend >30% time on CI/CD work due to feature commitments |
| Services | 40 microservices, with 12 in the critical path for Instant Refunds |
| Regulatory | Must meet PCI expectations and internal SOX-like change controls for financial reporting systems (Ledger) |
| Reliability | Must maintain 99.95% checkout availability during the migration period |
| Tooling | Existing mix: GitHub Actions, Jenkins, ArgoCD; no budget approved for a new paid CI platform this quarter |
Your answer should reflect real execution: sequencing, critical path, decision-making under constraints, and how you’d handle conflict when stakeholders have legitimate but incompatible goals.