You are replacing a manual reporting workflow for a B2B software business where finance analysts download CSVs from ERP, CRM, billing, and expense systems, then reconcile them in spreadsheets to produce weekly and month-end reporting packs. The current process takes 2-3 business days, produces frequent version mismatches, and recently triggered an executive escalation after revenue and bookings totals differed across reports. You need a scalable pipeline that standardizes definitions, reduces spreadsheet work, and delivers trusted reporting tables and dashboards with auditability.
| Component | Status / Technology |
|---|---|
| Source systems | Oracle NetSuite, Salesforce, Workday, Coupa |
| Ingestion | Manual CSV exports and ad hoc SFTP drops |
| Processing | Analyst-owned Excel workbooks and SQL scripts |
| Storage | VMware Tanzu Greenplum data warehouse |
| Orchestration | Cron jobs and manual refresh steps |
| Reporting | VMware Tanzu Grafana and spreadsheet packs |
Scale: ~25 source extracts/day, 15-40M rows/month across all sources, largest fact table ~400M rows retained for 36 months, weekday refresh target by 7:00 AM local time, month-end reruns/backfills required within 4 hours.
How would you design this reporting pipeline so finance can produce recurring revenue, bookings, expense, and variance reporting with minimal spreadsheet manipulation while keeping the workflow reliable, auditable, and easy to backfill as source data changes?