Data Architecture & Integration Design
This area validates how you ingest, normalize, and expose data from complex operational systems. Interviewers will test your ability to design end-to-end flows—covering authentication, incremental strategies, idempotency, backfills, data contracts, and access patterns for apps, BI, and agents.
Be ready to go over:
- Integration patterns (API, file-based, event-driven): Choosing the right pattern per source constraints and SLAs
- Incremental loads and change data capture (CDC): Designing robust state management and recovery
- Data contracts and schema management: Enforcing evolution policies and compatibility
- Advanced concepts (less common): Event sourcing, streaming upserts, multi-tenant isolation, cross-system referential integrity
Example questions or scenarios:
- "Design an integration from Procore and CMiC into a centralized repository with reliable incremental updates and backfill strategy."
- "How would you enforce data contracts across multiple upstream teams to avoid breaking downstream apps?"
- "Walk through your approach to idempotency and failure recovery when an upstream API rate-limits or returns partial data."
Cloud & Lakehouse Engineering on AWS
You’ll be assessed on designing and operating a lakehouse on AWS, including S3 + Apache Iceberg (or Delta/Hudi), Spark (Glue/EMR), and orchestration (Airflow/Step Functions). The focus is on how table formats work under the hood—snapshots, metadata, partitioning, schema evolution—and how you optimize for performance and cost.
Be ready to go over:
- Bronze/Silver/Gold modeling: Raw to curated to serving layers for analytics and applications
- Table format internals: Manifests, snapshots, compaction, data skipping, partition evolution
- Performance engineering: File sizing, partition strategies, Z-ordering/clustering equivalents, join patterns
- Advanced concepts (less common): ACID guarantees at scale, multi-writer concurrency, cross-account sharing
Example questions or scenarios:
- "Re-engineer a legacy ETL into a Bronze/Silver/Gold lakehouse on S3 with Iceberg—explain partitioning and compaction strategy."
- "When would you choose Step Functions vs. Airflow for orchestration, and why?"
- "How do you handle schema evolution safely while preserving snapshot isolation and downstream SLAs?"
Data Governance, Quality, and Lineage
AECOM invests in enterprise governance and cataloging to ensure accountability, transparency, and quality. You will be evaluated on metadata ingestion, automated profiling, DQ controls, lineage tracing across ingestion/transformation/consumption, and RBAC aligned to security and privacy requirements.
Be ready to go over:
- Metadata and catalog configuration: Asset models, relationships, role-based access
- Data profiling and quality automation: Validations, thresholding, incident routing
- Technical lineage: Mapping from sources to transformations to BI/app endpoints
- Advanced concepts (less common): PII classification at scale, policy-as-code, differential access per data domain
Example questions or scenarios:
- "Enable lineage across Glue jobs and BI endpoints—how would you validate completeness and troubleshoot gaps?"
- "Design automated profiling and DQ checks for a high-churn project dataset—what metrics and alerts matter most?"
- "Explain how you would implement RBAC for stewards, analysts, and app services in the governance platform."
Reliability, Observability, and Operational Excellence
Interviewers want evidence that your systems are operable, observable, and cost-aware. You will discuss monitoring/alerting, runbooks, SLA/SLOs, retry/backoff/idempotency, cost/performance tradeoffs, and post-incident improvement loops.
Be ready to go over:
- Observability stack: Metrics, logs, traces, data validation signals
- Failure handling: Replay/backfill processes, dead-letter queues, partial retry strategies
- Cost controls: Storage lifecycle policies, compute optimization, workload scheduling
- Advanced concepts (less common): Operational analytics for pipelines, error budget policies, canary datasets
Example questions or scenarios:
- "Outline your monitoring, alerting, and runbook approach for an hourly ingestion that occasionally backlogs."
- "A pipeline’s costs spiked 3x—walk through your systematic diagnosis and remediation plan."
- "How do you balance latency requirements for app endpoints with batch economics?"
Applied AI & Data Products for Construction
AECOM is enabling AI/LLM workflows—semantic search, summarization, classification, and agent-driven retrieval. You’ll be asked how to shape datasets for AI readiness: structured outputs, evaluation/QA, human-in-the-loop safeguards, and efficient access patterns for low-latency apps and agents.
Be ready to go over:
- RAG and semantic search: Indexing strategies, chunking/metadata, freshness signals
- Structured extraction pipelines: Templates, evaluation harnesses, drift detection
- Human-in-the-loop: Review queues, feedback loops, measurable quality criteria
- Advanced concepts (less common): Agent orchestration over governed data, provenance tracking, prompt/data leakage controls
Example questions or scenarios:
- "Design a data pipeline that enables semantic search over drawings and RFIs with traceability back to source documents."
- "Implement a classification workflow with structured outputs and human-in-the-loop QA—how do you measure quality?"
- "How would you expose a low-latency query endpoint for an internal agent while maintaining governance and lineage?"