You own the security posture of a fleet telemetry ingestion platform that accepts high-volume events from vehicles, edge gateways, and internal services. Over the last quarter, event volume has grown quickly, and teams are seeing higher ingest latency, uneven backpressure, and concern that emergency scaling changes could weaken isolation or expose sensitive logs. The platform handles both operational telemetry and restricted security-relevant events used for detection and incident response.
What considerations would you take into account for scalability and performance while keeping the system secure? Walk through how you would design the architecture, choose control points, and decide what to optimize first as load continues to grow.