Dataford
Interview Guides
Upgrade
All questions/Pipelines/Build Trustworthy Live Dashboard Pipeline

Build Trustworthy Live Dashboard Pipeline

Hard
Pipelines
Asked at 8 companies8IdempotencyBackfillingQuality
Also asked at
Deutsche Börse GroupDPR ConstructionAPurdue UniversityAirbus GroupCurrent (NY)

Problem

Scenario

You are responsible for the pipeline feeding executive and client-facing Power BI dashboards for a subscription product. The dashboards refresh every few minutes, but source data changes frequently because orders, refunds, CRM updates, and product usage events arrive out of order and are occasionally corrected after initial ingestion. Stakeholders have escalated repeated mismatches between dashboard totals, finance extracts, and operational reports, and an audit found no consistent way to explain which data version a visual was built from. You need to redesign the pipeline so the visual layer remains trustworthy even while upstream data is volatile.

Current State

ComponentStatus / Technology
Operational sourcesAzure SQL Database, Dynamics 365, application event APIs
IngestionAzure Data Factory copy jobs every 15 min + Event Hubs for usage events
ProcessingMixed PySpark notebooks in Azure Databricks, limited replay support
StorageAzure Data Lake Storage Gen2 + Azure Synapse Analytics serving tables
Semantic layerPower BI datasets with scheduled refresh
OrchestrationAzure Data Factory triggers and ad hoc notebook runs
Scale: ~180M usage events/day, 12M CRM/order mutations/day, peak 25K events/sec, 15-minute freshness target for operational visuals, daily finance reconciliation, 2 years of retained history.

Question

How would you redesign this pipeline so downstream visuals in Power BI remain explainable, consistent, and auditable as source records are inserted, updated, deleted, and replayed throughout the day? Describe the end-to-end pipeline approach you would use to preserve trust while still meeting freshness requirements.

Problem

Scenario

You are responsible for the pipeline feeding executive and client-facing Power BI dashboards for a subscription product. The dashboards refresh every few minutes, but source data changes frequently because orders, refunds, CRM updates, and product usage events arrive out of order and are occasionally corrected after initial ingestion. Stakeholders have escalated repeated mismatches between dashboard totals, finance extracts, and operational reports, and an audit found no consistent way to explain which data version a visual was built from. You need to redesign the pipeline so the visual layer remains trustworthy even while upstream data is volatile.

Current State

ComponentStatus / Technology
Operational sourcesAzure SQL Database, Dynamics 365, application event APIs
IngestionAzure Data Factory copy jobs every 15 min + Event Hubs for usage events
ProcessingMixed PySpark notebooks in Azure Databricks, limited replay support
StorageAzure Data Lake Storage Gen2 + Azure Synapse Analytics serving tables
Semantic layerPower BI datasets with scheduled refresh
OrchestrationAzure Data Factory triggers and ad hoc notebook runs
Scale: ~180M usage events/day, 12M CRM/order mutations/day, peak 25K events/sec, 15-minute freshness target for operational visuals, daily finance reconciliation, 2 years of retained history.

Question

How would you redesign this pipeline so downstream visuals in Power BI remain explainable, consistent, and auditable as source records are inserted, updated, deleted, and replayed throughout the day? Describe the end-to-end pipeline approach you would use to preserve trust while still meeting freshness requirements.

Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
Healthfirst (New York)Design Trustworthy BI Pipeline EvolutionHardBBuild Reliable Dashboard Refresh PipelineHardHarvard Medical SchoolBuild Freshness-Safe Visualization PipelineMedium
Next question