Design Real-Time Video Analytics Platform

Product Context

VisionGrid provides video analytics for retail chains, warehouses, and campuses. Customers upload or stream camera feeds and expect near-real-time detection, search, and alerting for events such as person entry, vehicle counting, safety violations, and suspicious activity.

Scale

Signal	Value
Customers	18,000 businesses
Daily active operators	220,000
Active cameras	2.4M
Peak concurrent video streams	850,000
Ingest rate at peak	~3.2M frames/sec after adaptive sampling
Events stored per day	9B detections / tracks
Searchable video archive	14 PB hot + warm storage
Alert query QPS	45,000
Investigative search QPS	6,000
End-to-end alert latency budget	p99 < 2 seconds

Task

Design the end-to-end ML system for this platform. Address the following:

Clarify the product requirements and define the primary ML tasks, outputs, and users.
Estimate system scale and propose a multi-stage architecture for ingest, candidate retrieval, ranking, and alert generation.
Choose models for each stage and explain online vs batch inference decisions.
Design the training, feature, and feedback pipelines, including how labels are created from delayed human review.
Define offline and online evaluation, monitoring, and rollout strategy.
Identify major failure modes, especially around feature drift, training-serving skew, camera heterogeneity, and operational outages.

Constraints

Cameras vary widely in resolution, frame rate, lighting, and placement; many are low quality.
Raw video retention is limited: 7 days hot, 30 days warm, then derived features only for compliance and cost.
Some customers require on-prem or edge inference for privacy; others allow cloud processing.
False negatives on safety alerts are costly, but excessive false positives cause alert fatigue.
Serving cost must stay below $0.015 per camera-hour on average.

Product Context

Scale

Signal	Value
Customers	18,000 businesses
Daily active operators	220,000
Active cameras	2.4M
Peak concurrent video streams	850,000
Ingest rate at peak	~3.2M frames/sec after adaptive sampling
Events stored per day	9B detections / tracks
Searchable video archive	14 PB hot + warm storage
Alert query QPS	45,000
Investigative search QPS	6,000
End-to-end alert latency budget	p99 < 2 seconds

Task

Design the end-to-end ML system for this platform. Address the following:

Clarify the product requirements and define the primary ML tasks, outputs, and users.
Estimate system scale and propose a multi-stage architecture for ingest, candidate retrieval, ranking, and alert generation.
Choose models for each stage and explain online vs batch inference decisions.
Design the training, feature, and feedback pipelines, including how labels are created from delayed human review.
Define offline and online evaluation, monitoring, and rollout strategy.
Identify major failure modes, especially around feature drift, training-serving skew, camera heterogeneity, and operational outages.

Constraints

Cameras vary widely in resolution, frame rate, lighting, and placement; many are low quality.
Raw video retention is limited: 7 days hot, 30 days warm, then derived features only for compliance and cost.
Some customers require on-prem or edge inference for privacy; others allow cloud processing.
False negatives on safety alerts are costly, but excessive false positives cause alert fatigue.
Serving cost must stay below $0.015 per camera-hour on average.

Product Context

Scale

Signal	Value
Customers	18,000 businesses
Daily active operators	220,000
Active cameras	2.4M
Peak concurrent video streams	850,000
Ingest rate at peak	~3.2M frames/sec after adaptive sampling
Events stored per day	9B detections / tracks
Searchable video archive	14 PB hot + warm storage
Alert query QPS	45,000
Investigative search QPS	6,000
End-to-end alert latency budget	p99 < 2 seconds

Task

Design the end-to-end ML system for this platform. Address the following:

Clarify the product requirements and define the primary ML tasks, outputs, and users.
Estimate system scale and propose a multi-stage architecture for ingest, candidate retrieval, ranking, and alert generation.
Choose models for each stage and explain online vs batch inference decisions.
Design the training, feature, and feedback pipelines, including how labels are created from delayed human review.
Define offline and online evaluation, monitoring, and rollout strategy.
Identify major failure modes, especially around feature drift, training-serving skew, camera heterogeneity, and operational outages.

Constraints

Cameras vary widely in resolution, frame rate, lighting, and placement; many are low quality.
Raw video retention is limited: 7 days hot, 30 days warm, then derived features only for compliance and cost.
Some customers require on-prem or edge inference for privacy; others allow cloud processing.
False negatives on safety alerts are costly, but excessive false positives cause alert fatigue.
Serving cost must stay below $0.015 per camera-hour on average.

Product Context

Scale

Signal	Value
Customers	18,000 businesses
Daily active operators	220,000
Active cameras	2.4M
Peak concurrent video streams	850,000
Ingest rate at peak	~3.2M frames/sec after adaptive sampling
Events stored per day	9B detections / tracks
Searchable video archive	14 PB hot + warm storage
Alert query QPS	45,000
Investigative search QPS	6,000
End-to-end alert latency budget	p99 < 2 seconds

Task

Design the end-to-end ML system for this platform. Address the following:

Clarify the product requirements and define the primary ML tasks, outputs, and users.
Estimate system scale and propose a multi-stage architecture for ingest, candidate retrieval, ranking, and alert generation.
Choose models for each stage and explain online vs batch inference decisions.
Design the training, feature, and feedback pipelines, including how labels are created from delayed human review.
Define offline and online evaluation, monitoring, and rollout strategy.
Identify major failure modes, especially around feature drift, training-serving skew, camera heterogeneity, and operational outages.

Constraints

Cameras vary widely in resolution, frame rate, lighting, and placement; many are low quality.
Raw video retention is limited: 7 days hot, 30 days warm, then derived features only for compliance and cost.
Some customers require on-prem or edge inference for privacy; others allow cloud processing.
False negatives on safety alerts are costly, but excessive false positives cause alert fatigue.
Serving cost must stay below $0.015 per camera-hour on average.

Interview Guides

Product Context

Scale

Task

Constraints

Design Real-Time Video Analytics Platform

Product Context

Scale

Task

Constraints

Your Answer

Design Real-Time Video Analytics Platform

Product Context

Scale

Task

Constraints

Design Real-Time Video Analytics Platform

Product Context

Scale

Task

Constraints

Your Answer