You own a Java-based backend service running in containers behind an internal API gateway. Over the last hour, the service has shown rising memory usage, repeated restarts, and intermittent latency spikes, and the team is considering enabling heap dumps and deeper JVM diagnostics in production to understand what is happening. The service processes authenticated user requests and may hold session data, tokens, and customer records in memory during request handling.
How would you investigate and stabilize this incident while minimizing security risk from the diagnostic data you collect? Explain how Java garbage collection behavior affects your approach, what controls you would put around production debugging, and how you would verify those controls are working.
JVM garbage collection behavior under memory pressureSecure handling of heap dumps, JFR, and logsKubernetes production debug access and auditabilityContainment and recovery during an active availability incident