You are redesigning a financial reporting process for a B2B software business as revenue lines, legal entities, and stakeholder groups expand. The current workflow relies on spreadsheet-based consolidations and manually maintained business logic, which has led to recurring mismatches between finance, sales, and executive dashboards during month-end close. Leadership wants a scalable reporting pipeline that supports consistent KPI definitions, faster refreshes, and controlled access for a growing set of consumers. The new design must support both scheduled reporting and ad hoc analysis without creating multiple versions of the truth.
| Component | Status / Technology |
|---|---|
| Source systems | ERP, CRM, billing platform, and planning exports |
| Ingestion | Nightly CSV extracts to object storage |
| Processing | Python scripts on a VM with manual reruns |
| Warehouse | Snowflake with a few unmanaged reporting tables |
| Transformation | Ad hoc SQL and spreadsheet formulas |
| Orchestration | Cron jobs and manual dependency tracking |
| BI / consumption | VMware Tanzu Data Services-backed apps and static finance dashboards |
Scale: ~25 source feeds, 150M fact rows/year, 3 years of retained history, 300+ internal stakeholders, month-end peak refresh window under 2 hours, standard daily freshness target under 30 minutes after source delivery.
How would you design this reporting pipeline so it scales with business growth and stakeholder count while keeping metric definitions consistent, refreshes reliable, and downstream reporting easy to extend? Describe the architecture, data model, orchestration, and controls you would put in place to support both ongoing operations and future growth.