You own an embedded network security appliance that includes a boot ROM, secure boot chain, Linux-based control plane, hardware crypto, watchdog logic, and a management plane exposed through FortiManager and local console access. After a firmware rollout, a subset of devices begins rebooting intermittently, some fail remote management enrollment, and others show packet-processing degradation even though basic network reachability remains. Logs are incomplete because several affected units reboot before forwarding full telemetry, and there is concern that the issue could be either a regression in the update path or a security-relevant failure in the trust chain.
How would you approach debugging this system end to end so you can distinguish a software defect from a security compromise or hardware fault, contain risk while preserving forensic value, and drive the system back to a known-good state? Be explicit about the order of investigation, the trust boundaries you would test first, and the signals you would rely on to prove your hypothesis.