CI/CD, Release Engineering, and Platform Automation
This is foundational. Interviewers assess how you design end-to-end pipelines that are fast, secure, and predictable—supporting hundreds of daily deploys without regressions. Expect to discuss artifact strategies, environment promotion models, policy-as-code, and rollback patterns.
- Be ready to go over:
- Pipeline architecture: Trunk-based vs. GitFlow, canary/blue-green, progressive delivery, and automated rollbacks
- Testing strategy: Unit/integration/e2e, test flakiness management, parallelization, and quality gates
- Policy & security: Secrets management, SBOMs, signing, supply chain controls in CI/CD
- Advanced concepts (less common): Self-service platforms (paved roads), ephemeral environments, monorepo build scaling, remote cache
- Example questions or scenarios:
- "Design a CI/CD pipeline for a microservice fleet with zero-downtime deploys and fast rollback."
- "How would you detect and mitigate test flakiness that intermittently blocks releases?"
- "Walk us through your approach to integrating SAST/DAST and artifact signing without slowing engineers."
Cloud Infrastructure and Systems Design (AWS-centric)
You will be asked to assemble robust, secure, and cost-aware architectures in AWS (or a major cloud). Interviewers probe your grasp of compute, networking, storage, IAM, and the patterns that turn them into reliable platforms.
- Be ready to go over:
- Core primitives: VPC, subnets, security groups, load balancers, autoscaling, EKS/ECS
- Resilience: Multi-AZ design, state management, backups/DR, regional failover
- Cost & performance: Right-sizing, caching, scaling triggers, capacity planning
- Advanced concepts (less common): Service mesh (mTLS), cross-account IAM, transit gateways, private link patterns
- Example questions or scenarios:
- "Design a multi-AZ service on AWS with strict egress controls and auditability."
- "How do you approach DR objectives (RTO/RPO) for a stateful service?"
- "Explain trade-offs between EKS, ECS, and serverless for a latency-sensitive workload."
Reliability Engineering, Observability, and Incident Response
Reliability is a first-class metric at Salesforce. You’ll be evaluated on how you instrument systems, define SLIs/SLOs, set actionable alerts, and lead during incidents. Expect to narrate real incidents, root cause analysis, and prevention at the system and process level.
- Be ready to go over:
- Observability stack: Metrics, logs, traces; tools like Splunk/ELK, Prometheus/Grafana, OpenTelemetry
- SLOs and alerting: Error budgets, burn-rate alerts, signal-to-noise improvements
- Incident command: Roles, communication cadence, decision logs, postmortem quality
- Advanced concepts (less common): Adaptive alerting, chaos experiments, dark launches, traffic shadowing
- Example questions or scenarios:
- "Define SLIs/SLOs for an API gateway and outline your alert strategy."
- "You’re Incident Commander for a cross-region outage—walk us through your first 15 minutes."
- "How did you reduce alert fatigue while improving MTTD/MTTR?"