What is a Data Engineer?
A Data Engineer at AECOM is a systems builder and integrator who turns fragmented, real-world operational data into reliable, reusable, and governed data products that power decisions, applications, and emerging AI use cases. You will design and operate production-grade integrations from sources such as ERP (e.g., CMiC, Textura), project management (e.g., Autodesk Construction Cloud, Procore), scheduling tools (e.g., Primavera P6), and document repositories (e.g., SharePoint), funneling them into a centralized, queryable repository. Your work directly enables construction operations, project delivery, and enterprise analytics—well beyond dashboards.
This role is critical because AECOM’s impact is tangible: keeping projects on schedule, managing field operations, and elevating client delivery with trustworthy, auditable data. By standardizing metadata, enforcing data contracts, and building resilient pipelines, you will support downstream applications, APIs, and AI/LLM-powered workflows such as semantic search, summarization, classification, and agent-driven retrieval. The problems you’ll solve are real and sometimes messy—built around budgets, schedules, field workflows, and legacy systems—and that’s what makes the work meaningful, technical, and business-critical.
Expect to own solutions end-to-end: authentication, incremental loads, normalization, lineage, observability, retries/idempotency, backfills, and documentation. Whether you’re modernizing an enterprise AWS lakehouse with S3 + Apache Iceberg, building operational integrations for a construction delivery team, or enabling an enterprise data governance platform, your engineering craft will directly shape how thousands of professionals discover, use, and trust AECOM’s data.
Getting Ready for Your Interviews
Prioritize fundamentals that translate to production impact: integration patterns, data modeling, lakehouse architecture, governance and lineage, observability, and domain fluency for project-centric data. Your interviewers will probe both how you design systems and how you operate them under real constraints—cost, latency, data quality, and change over time.
-
Role-related Knowledge (Technical/Domain Skills) - Interviewers assess your command of data integration (API, file- and event-driven), AWS lakehouse patterns (S3, Spark, Glue/EMR, Iceberg/Delta/Hudi), and metadata/lineage. Demonstrate fluency by naming concrete design choices (partitioning, schema evolution, access patterns) and tradeoffs you’ve managed in production.
-
Problem-Solving Ability (How You Approach Challenges) - Expect scenario-based prompts with imperfect inputs and evolving requirements. Show how you break down ambiguity, define data contracts, validate assumptions, and design for retries, idempotency, and backfills. Quantify outcomes and explain why your design is robust.
-
Leadership (Influence Without Authority) - Even as an IC, you will lead through standards, code reviews, runbooks, and cross-functional alignment. Interviewers look for how you mentor others, codify best practices, and guide stakeholders toward pragmatic, maintainable solutions.
-
Culture Fit (Collaboration and Ownership) - AECOM values hands-on builders who partner closely with construction professionals and analysts. Demonstrate operational ownership, clear communication, and a bias for practical, iterative delivery—especially when dealing with messy, project-driven data.
-
Operational Excellence (Reliability, Observability, Cost Awareness) - You’ll be evaluated on how you design for monitoring/alerting, data validation, lineage, and SLAs. Share examples of runbooks, post-incident learnings, and how you balance performance with cost and maintainability.
Interview Process Overview
AECOM’s interview experience is structured to validate production-readiness and collaborative problem-solving. You will navigate a blend of technical deep dives, architecture conversations, and scenario-based cases grounded in real project operations. The tone is professional, practical, and outcome-focused; interviewers want to understand how you ship reliable systems that business teams actually use.
Expect steady rigor and a realistic pace. Instead of puzzle-heavy rounds, you’ll see hands-on evaluation of engineering fundamentals—SQL/Python/PySpark proficiency, AWS lakehouse patterns, integration reliability, observability, and data governance. You should also anticipate domain-centric prompts about construction and enterprise systems, how you handle fragmented metadata, and how you enable AI/LLM workflows safely and repeatably.
This timeline illustrates the typical stages from recruiter alignment through technical assessments, architecture discussions, and cross-functional conversations. Use it to plan your prep sprints and stakeholder questions. Keep notes after each stage on open requirements (e.g., on-site expectations, governance tooling stack) to tailor your follow-ups and design proposals.
Deep Dive into Evaluation Areas
Data Architecture & Integration Design
This area validates how you ingest, normalize, and expose data from complex operational systems. Interviewers will test your ability to design end-to-end flows—covering authentication, incremental strategies, idempotency, backfills, data contracts, and access patterns for apps, BI, and agents.
Be ready to go over:
- Integration patterns (API, file-based, event-driven): Choosing the right pattern per source constraints and SLAs
- Incremental loads and change data capture (CDC): Designing robust state management and recovery
- Data contracts and schema management: Enforcing evolution policies and compatibility
- Advanced concepts (less common): Event sourcing, streaming upserts, multi-tenant isolation, cross-system referential integrity
Example questions or scenarios:
- "Design an integration from Procore and CMiC into a centralized repository with reliable incremental updates and backfill strategy."
- "How would you enforce data contracts across multiple upstream teams to avoid breaking downstream apps?"
- "Walk through your approach to idempotency and failure recovery when an upstream API rate-limits or returns partial data."
Cloud & Lakehouse Engineering on AWS
You’ll be assessed on designing and operating a lakehouse on AWS, including S3 + Apache Iceberg (or Delta/Hudi), Spark (Glue/EMR), and orchestration (Airflow/Step Functions). The focus is on how table formats work under the hood—snapshots, metadata, partitioning, schema evolution—and how you optimize for performance and cost.
Be ready to go over:
- Bronze/Silver/Gold modeling: Raw to curated to serving layers for analytics and applications
- Table format internals: Manifests, snapshots, compaction, data skipping, partition evolution
- Performance engineering: File sizing, partition strategies, Z-ordering/clustering equivalents, join patterns
- Advanced concepts (less common): ACID guarantees at scale, multi-writer concurrency, cross-account sharing
Example questions or scenarios:
- "Re-engineer a legacy ETL into a Bronze/Silver/Gold lakehouse on S3 with Iceberg—explain partitioning and compaction strategy."
- "When would you choose Step Functions vs. Airflow for orchestration, and why?"
- "How do you handle schema evolution safely while preserving snapshot isolation and downstream SLAs?"
Data Governance, Quality, and Lineage
AECOM invests in enterprise governance and cataloging to ensure accountability, transparency, and quality. You will be evaluated on metadata ingestion, automated profiling, DQ controls, lineage tracing across ingestion/transformation/consumption, and RBAC aligned to security and privacy requirements.
Be ready to go over:
- Metadata and catalog configuration: Asset models, relationships, role-based access
- Data profiling and quality automation: Validations, thresholding, incident routing
- Technical lineage: Mapping from sources to transformations to BI/app endpoints
- Advanced concepts (less common): PII classification at scale, policy-as-code, differential access per data domain
Example questions or scenarios:
- "Enable lineage across Glue jobs and BI endpoints—how would you validate completeness and troubleshoot gaps?"
- "Design automated profiling and DQ checks for a high-churn project dataset—what metrics and alerts matter most?"
- "Explain how you would implement RBAC for stewards, analysts, and app services in the governance platform."
Reliability, Observability, and Operational Excellence
Interviewers want evidence that your systems are operable, observable, and cost-aware. You will discuss monitoring/alerting, runbooks, SLA/SLOs, retry/backoff/idempotency, cost/performance tradeoffs, and post-incident improvement loops.
Be ready to go over:
- Observability stack: Metrics, logs, traces, data validation signals
- Failure handling: Replay/backfill processes, dead-letter queues, partial retry strategies
- Cost controls: Storage lifecycle policies, compute optimization, workload scheduling
- Advanced concepts (less common): Operational analytics for pipelines, error budget policies, canary datasets
Example questions or scenarios:
- "Outline your monitoring, alerting, and runbook approach for an hourly ingestion that occasionally backlogs."
- "A pipeline’s costs spiked 3x—walk through your systematic diagnosis and remediation plan."
- "How do you balance latency requirements for app endpoints with batch economics?"
Applied AI & Data Products for Construction
AECOM is enabling AI/LLM workflows—semantic search, summarization, classification, and agent-driven retrieval. You’ll be asked how to shape datasets for AI readiness: structured outputs, evaluation/QA, human-in-the-loop safeguards, and efficient access patterns for low-latency apps and agents.
Be ready to go over:
- RAG and semantic search: Indexing strategies, chunking/metadata, freshness signals
- Structured extraction pipelines: Templates, evaluation harnesses, drift detection
- Human-in-the-loop: Review queues, feedback loops, measurable quality criteria
- Advanced concepts (less common): Agent orchestration over governed data, provenance tracking, prompt/data leakage controls
Example questions or scenarios:
- "Design a data pipeline that enables semantic search over drawings and RFIs with traceability back to source documents."
- "Implement a classification workflow with structured outputs and human-in-the-loop QA—how do you measure quality?"
- "How would you expose a low-latency query endpoint for an internal agent while maintaining governance and lineage?"
This visualization highlights the interview’s high-signal themes—expect emphasis on AWS, Spark/PySpark, Iceberg/Delta/Hudi, data contracts, lineage, observability, and AI-readiness. Use it to prioritize your study plan and to structure “walk-through” stories that connect architecture choices to business outcomes.
Key Responsibilities
You will design, build, and operate end-to-end data integrations and platforms that serve analytics, internal applications, and AI/LLM-enabled workflows. Day to day, you will partner with construction professionals, analysts, and platform teams to translate messy, project-based data into governed, reusable assets and performant access patterns.
- Own integrations, end-to-end: source ingestion, normalization, storage, and access (BI, APIs, internal apps, agents), with authentication, incremental loads, retries/idempotency, and runbooks.
- Model and curate data in a centralized repository/lakehouse, enforcing metadata standards, lineage, and schema evolution policies.
- Enable AI use cases by building repeatable pipelines for semantic search, summarization, classification, structured extraction, and human-in-the-loop QA.
- Instrument reliability with monitoring, alerting, data validation, and failure recovery; drive improvements through post-incident learning.
- Build internal tools and services (web apps, APIs, utilities) to make data discoverable and easy to operationalize across teams.
- Contribute to standards for integration patterns, modeling, and governance-ready data access, ensuring clean interfaces, traceability, and auditability.
Role Requirements & Qualifications
This is a hands-on engineering role focused on outcomes and maintainability. You’re expected to balance deep technical delivery with clear documentation, pragmatic decisions, and cross-functional collaboration.
-
Must-have technical skills
- AWS data stack: S3, Glue/EMR (Spark), Step Functions or Airflow; CI/CD basics for data
- Open table formats: Apache Iceberg (or Delta/Hudi) with understanding of metadata, snapshots, partitioning, and schema evolution
- Data integration: API/file/event-driven patterns; incremental loads; idempotency; backfills; data contracts
- SQL/Python/PySpark for production data pipelines and performance tuning
- Observability and reliability: monitoring/alerting, validation, lineage, runbooks
-
Strong plus / differentiators
- AI/LLM data readiness: RAG, semantic search, structured extraction, evaluation harnesses
- Enterprise governance platforms: metadata ingestion, profiling, RBAC, lineage enablement
- Construction/engineering/manufacturing domain exposure and tool familiarity (ACC, Procore, CMiC, P6)
- Internal tools/APIs for data discovery and operationalization
-
Experience level
- Roles range from mid-level Data & Integration Engineer to Staff Data Engineer (lead IC). Staff roles emphasize architecture guidance, code reviews, mentoring, and enterprise-scale delivery.
-
Soft skills that stand out
- Operational ownership, crisp communication with non-technical partners, and a bias for iterative, testable delivery
- Ability to translate business needs into maintainable data products with clear contracts and SLAs
This snapshot provides indicative compensation ranges observed across AECOM postings and locations. Use it to calibrate expectations by role seniority and geography; confirm specifics with your recruiter, as packages may include benefits and hybrid/on-site considerations.
Common Interview Questions
Expect a blend of technical depth and domain-grounded scenarios. Use the prompts below to rehearse structured, outcome-oriented answers that highlight design choices, tradeoffs, and operational excellence.
Technical / Domain (Integrations, Lakehouse, Governance)
These questions validate your core engineering fluency and production instincts.
- How would you design an incremental ingestion from Procore and CMiC into an S3 + Iceberg lakehouse while ensuring idempotency and lineage?
- Walk through your approach to partitioning, compaction, and schema evolution for a high-churn project dataset.
- Explain how you’d implement data contracts across multiple teams and enforce compatibility over time.
- Describe your observability strategy for Spark-based pipelines in AWS (metrics, logs, traces, data validation).
- How do you control costs in Glue/EMR while meeting latency and throughput requirements?
System Design / Architecture
You’ll outline end-to-end solutions with clear tradeoffs and operational considerations.
- Design a centralized repository that serves BI, internal APIs, and AI agents with governed access patterns.
- When would you choose event-driven ingestion over scheduled batch for construction systems, and why?
- Propose a backfill and replay design for a pipeline with partial historical gaps and evolving schemas.
- Architect lineage capture across ingestion, transformation, and consumption, and describe validation methods.
- How would you expose a low-latency query endpoint for an internal agent hitting curated views?
Data Governance & Security
Interviewers probe how you operationalize accountability and compliance.
- How do you configure a data catalog (assets, relationships, RBAC) and integrate with data warehouses and BI tools?
- Describe automated profiling/DQ checks you’d implement for a newly onboarded source and how alerts route to owners.
- Explain technical lineage enablement and troubleshooting when transformations are mixed across Spark SQL and Python.
- How do you implement sensitive data classification and masking policies across curated layers?
- What KPIs signal that governance is improving data trust and adoption?
Coding / SQL / PySpark
Expect hands-on tasks that verify correctness and performance.
- Write SQL to produce a slowly-changing dimension (Type 2) from incremental change logs.
- Implement an idempotent upsert in PySpark against an Iceberg table with partition evolution.
- Optimize a skewed join in PySpark with practical techniques and explain tradeoffs.
- Parse semi-structured files (JSON/CSV variants) with schema drift and validate against a contract.
- Implement a backfill job with checkpointing and safe retries.
Problem-Solving / Case Studies
Scenario-based prompts align with real operational constraints.
- A source API starts returning partial data and 429s—walk through diagnosis, retries, and compensating actions.
- A curated table’s freshness SLA is missed due to upstream changes—how do you triage, communicate, and prevent recurrence?
- Your costs doubled overnight—what telemetry do you check first, and how do you remediate?
- An AI summarization pipeline drifts in quality—how do you detect, evaluate, and correct it?
- A stakeholder requests an urgent data product bypassing standards—how do you negotiate scope while protecting maintainability?
Use this interactive module on Dataford to practice by topic, difficulty, and format. Simulate timed responses and compare model answers to tighten structure, depth, and clarity before your live interviews.
Frequently Asked Questions
Q: How difficult are the interviews, and how much time should I prepare?
Plan for moderate-to-high rigor with a practical focus. Most candidates benefit from 2–3 weeks of dedicated prep emphasizing AWS lakehouse patterns, integration reliability, governance/lineage, and PySpark/SQL fluency.
Q: What differentiates successful candidates at AECOM?
They present concrete production stories with metrics, articulate tradeoffs, and show strong operational instincts—retries, idempotency, lineage, observability, and cost control—while collaborating smoothly with non-technical partners.
Q: What is the culture like for data teams?
Professional, mission-driven, and outcome-oriented. Teams value hands-on builders who can translate real operational needs (often messy and time-bound) into reliable, governed data products that scale.
Q: What timeline should I expect after interviews?
Timelines vary by role and location. Stay proactive: send concise summaries after each round, clarify open questions (on-site/hybrid expectations, tool stacks), and be responsive to any take-home or follow-up requests.
Q: Is remote work available?
Some roles are hybrid (e.g., Dallas/Houston), while others are on-site (e.g., Phoenix Data & Integration Engineer). Confirm your role’s location model during recruiter alignment.
Other General Tips
- Lead with outcomes: Tie architecture choices to measurable impacts (freshness SLAs, cost reductions, query latency, incident MTTR).
- Speak in contracts: Use the language of data contracts, compatibility, and evolution policies—it signals production maturity.
- Demonstrate operability: Show your monitoring/alerting, runbooks, and incident retrospectives. Reliability thinking is a key differentiator.
- Show domain empathy: Reference construction/engineering realities—schedules, budgets, change orders, field constraints—and how your designs respect them.
- Design for AI-readiness: Discuss metadata, provenance, and structured outputs for RAG/summarization; mention evaluation harnesses and human-in-the-loop controls.
- Bring artifacts: If permitted, reference sanitized diagrams, sample runbooks, or pseudo-PRs to make your approach tangible and memorable.
Summary & Next Steps
The Data Engineer role at AECOM is a high-impact opportunity to build production-grade data systems that power construction operations, enterprise analytics, and next-generation AI workflows. You’ll own integrations end-to-end, enforce governance and lineage, and enable performant access patterns for apps, BI, and agents—work that directly advances AECOM’s mission to deliver a better world.
Focus your preparation on five pillars: integration design and data contracts, AWS lakehouse internals (S3 + Iceberg/Delta/Hudi), governance/lineage and DQ automation, observability and operational excellence, and AI-readiness for real use cases. Anchor your answers in real production stories with clear metrics and tradeoffs, and be explicit about failure handling, cost control, and maintainability.
Use Dataford’s modules to practice targeted questions and refine your system design narratives. Enter your interviews with confidence: you’ve built systems that last, and you can show how. Translate that experience into clear, structured answers—and demonstrate the leadership, judgment, and ownership AECOM expects from its engineers.
