Scale Regional Reporting Pipelines

Context

Meta is launching a new reporting process for cross-region business performance used by Finance, Operations, and regional strategy teams. Today, APAC, EMEA, and NAMER each run separate batch jobs and spreadsheet-based adjustments, creating inconsistent metric definitions, slow backfills, and frequent reconciliation issues.

You need to design a unified pipeline using Meta-style internal data platforms that can ingest regional source data, standardize business logic, and publish trusted reporting tables that scale across teams without duplicating logic.

Scale Requirements

Regions: 12 global regions, 150+ country/business-unit combinations
Source systems: 40+ upstream datasets (Ads delivery, billing, CRM, workforce, regional finance adjustments)
Volume: ~8 TB/day raw input, 25B rows/day appended across all sources
Freshness: hourly for operational reporting, daily certified close tables by 6 AM local region time
Concurrency: 500+ internal dashboard users, 80+ downstream scheduled reports
Retention: 3 years detailed data, 7 years monthly rollups for audit support

Requirements

Build a reusable ingestion and transformation framework so new regions can onboard with configuration, not custom code.
Standardize metric definitions, calendar alignment, currency conversion, and regional hierarchy mapping.
Support both hourly incremental processing and large historical backfills without double counting.
Publish curated reporting tables for internal Meta dashboards and analyst self-serve queries.
Implement strong data quality controls for schema drift, late-arriving files, duplicate loads, and reconciliation to source totals.
Design orchestration with dependency management across regional cutoffs and shared reference data.
Define monitoring, alerting, and recovery procedures for failed regional runs.

Constraints

Existing storage is in a Hive/Presto-compatible Meta data lake with scheduled workflows already managed centrally.
Regional teams can supply mapping files and business rules, but central data engineering owns pipeline code.
Financial reporting outputs must be auditable, reproducible, and support row-level lineage to source loads.
Budget favors shared batch infrastructure over standing real-time systems; avoid region-specific bespoke pipelines.

Context

Scale Requirements

Regions: 12 global regions, 150+ country/business-unit combinations
Source systems: 40+ upstream datasets (Ads delivery, billing, CRM, workforce, regional finance adjustments)
Volume: ~8 TB/day raw input, 25B rows/day appended across all sources
Freshness: hourly for operational reporting, daily certified close tables by 6 AM local region time
Concurrency: 500+ internal dashboard users, 80+ downstream scheduled reports
Retention: 3 years detailed data, 7 years monthly rollups for audit support

Requirements

Build a reusable ingestion and transformation framework so new regions can onboard with configuration, not custom code.
Standardize metric definitions, calendar alignment, currency conversion, and regional hierarchy mapping.
Support both hourly incremental processing and large historical backfills without double counting.
Publish curated reporting tables for internal Meta dashboards and analyst self-serve queries.
Implement strong data quality controls for schema drift, late-arriving files, duplicate loads, and reconciliation to source totals.
Design orchestration with dependency management across regional cutoffs and shared reference data.
Define monitoring, alerting, and recovery procedures for failed regional runs.

Constraints

Existing storage is in a Hive/Presto-compatible Meta data lake with scheduled workflows already managed centrally.
Regional teams can supply mapping files and business rules, but central data engineering owns pipeline code.
Financial reporting outputs must be auditable, reproducible, and support row-level lineage to source loads.
Budget favors shared batch infrastructure over standing real-time systems; avoid region-specific bespoke pipelines.

Context

Scale Requirements

Regions: 12 global regions, 150+ country/business-unit combinations
Source systems: 40+ upstream datasets (Ads delivery, billing, CRM, workforce, regional finance adjustments)
Volume: ~8 TB/day raw input, 25B rows/day appended across all sources
Freshness: hourly for operational reporting, daily certified close tables by 6 AM local region time
Concurrency: 500+ internal dashboard users, 80+ downstream scheduled reports
Retention: 3 years detailed data, 7 years monthly rollups for audit support

Requirements

Build a reusable ingestion and transformation framework so new regions can onboard with configuration, not custom code.
Standardize metric definitions, calendar alignment, currency conversion, and regional hierarchy mapping.
Support both hourly incremental processing and large historical backfills without double counting.
Publish curated reporting tables for internal Meta dashboards and analyst self-serve queries.
Implement strong data quality controls for schema drift, late-arriving files, duplicate loads, and reconciliation to source totals.
Design orchestration with dependency management across regional cutoffs and shared reference data.
Define monitoring, alerting, and recovery procedures for failed regional runs.

Constraints

Existing storage is in a Hive/Presto-compatible Meta data lake with scheduled workflows already managed centrally.
Regional teams can supply mapping files and business rules, but central data engineering owns pipeline code.
Financial reporting outputs must be auditable, reproducible, and support row-level lineage to source loads.
Budget favors shared batch infrastructure over standing real-time systems; avoid region-specific bespoke pipelines.

Context

Scale Requirements

Regions: 12 global regions, 150+ country/business-unit combinations
Source systems: 40+ upstream datasets (Ads delivery, billing, CRM, workforce, regional finance adjustments)
Volume: ~8 TB/day raw input, 25B rows/day appended across all sources
Freshness: hourly for operational reporting, daily certified close tables by 6 AM local region time
Concurrency: 500+ internal dashboard users, 80+ downstream scheduled reports
Retention: 3 years detailed data, 7 years monthly rollups for audit support

Requirements

Build a reusable ingestion and transformation framework so new regions can onboard with configuration, not custom code.
Standardize metric definitions, calendar alignment, currency conversion, and regional hierarchy mapping.
Support both hourly incremental processing and large historical backfills without double counting.
Publish curated reporting tables for internal Meta dashboards and analyst self-serve queries.
Implement strong data quality controls for schema drift, late-arriving files, duplicate loads, and reconciliation to source totals.
Design orchestration with dependency management across regional cutoffs and shared reference data.
Define monitoring, alerting, and recovery procedures for failed regional runs.

Constraints

Existing storage is in a Hive/Presto-compatible Meta data lake with scheduled workflows already managed centrally.
Regional teams can supply mapping files and business rules, but central data engineering owns pipeline code.
Financial reporting outputs must be auditable, reproducible, and support row-level lineage to source loads.
Budget favors shared batch infrastructure over standing real-time systems; avoid region-specific bespoke pipelines.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Scale Regional Reporting Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer

Scale Regional Reporting Pipelines

Context

Scale Requirements

Requirements

Constraints

Scale Regional Reporting Pipelines

Context

Scale Requirements

Requirements

Constraints

Your Answer