Interview Guides

Design Multimodal Search and Ranking

Hard

ML System Design

Product Context

StyleSnap is a shopping app where users search using text, images, or both (for example, uploading a photo of shoes and adding “similar but cheaper”). Design the end-to-end multimodal retrieval and ranking system that returns relevant products from a large catalog.

Scale

Signal	Value
DAU	35M
Peak search QPS	180K
Product catalog	120M active SKUs
New/updated items per day	4M
Queries with image input	22%
Per-request latency budget (p99)	250ms end-to-end

Task

Clarify the product requirements and success metrics for multimodal search.
Propose a multi-stage architecture for retrieval, ranking, and re-ranking at this scale.
Choose model families for each stage and explain how text, image, and metadata signals are combined.
Design the offline and online data pipelines, including feature storage, training cadence, and index refresh.
Define offline evaluation, online experimentation, and monitoring for drift, skew, and quality regressions.
Identify key failure modes, fallback behavior, and cost/latency tradeoffs.

Constraints

The system must support text-only, image-only, and text+image queries in one API.
Raw image models larger than 1 GB cannot run synchronously per request for cost reasons.
Newly added products should become searchable within 15 minutes.
The marketplace has strict policy filters: blocked brands, unsafe content, and region-specific compliance rules must be enforced before final ranking.
Mobile clients are sensitive to tail latency; p99 above 250ms causes measurable search abandonment.

Design Multimodal Search and Ranking

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	35M
Peak search QPS	180K
Product catalog	120M active SKUs
New/updated items per day	4M
Queries with image input	22%
Per-request latency budget (p99)	250ms end-to-end

Task

Clarify the product requirements and success metrics for multimodal search.
Propose a multi-stage architecture for retrieval, ranking, and re-ranking at this scale.
Choose model families for each stage and explain how text, image, and metadata signals are combined.
Design the offline and online data pipelines, including feature storage, training cadence, and index refresh.
Define offline evaluation, online experimentation, and monitoring for drift, skew, and quality regressions.
Identify key failure modes, fallback behavior, and cost/latency tradeoffs.

Constraints

The system must support text-only, image-only, and text+image queries in one API.
Raw image models larger than 1 GB cannot run synchronously per request for cost reasons.
Newly added products should become searchable within 15 minutes.
The marketplace has strict policy filters: blocked brands, unsafe content, and region-specific compliance rules must be enforced before final ranking.
Mobile clients are sensitive to tail latency; p99 above 250ms causes measurable search abandonment.

Your Answer

Design Multimodal Search and Ranking

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	35M
Peak search QPS	180K
Product catalog	120M active SKUs
New/updated items per day	4M
Queries with image input	22%
Per-request latency budget (p99)	250ms end-to-end

Task

Clarify the product requirements and success metrics for multimodal search.
Propose a multi-stage architecture for retrieval, ranking, and re-ranking at this scale.
Choose model families for each stage and explain how text, image, and metadata signals are combined.
Design the offline and online data pipelines, including feature storage, training cadence, and index refresh.
Define offline evaluation, online experimentation, and monitoring for drift, skew, and quality regressions.
Identify key failure modes, fallback behavior, and cost/latency tradeoffs.

Constraints

The system must support text-only, image-only, and text+image queries in one API.
Raw image models larger than 1 GB cannot run synchronously per request for cost reasons.
Newly added products should become searchable within 15 minutes.
The marketplace has strict policy filters: blocked brands, unsafe content, and region-specific compliance rules must be enforced before final ranking.
Mobile clients are sensitive to tail latency; p99 above 250ms causes measurable search abandonment.

Design Multimodal Search and Ranking

Hard

ML System Design

Product Context

Scale

Signal	Value
DAU	35M
Peak search QPS	180K
Product catalog	120M active SKUs
New/updated items per day	4M
Queries with image input	22%
Per-request latency budget (p99)	250ms end-to-end

Task

Clarify the product requirements and success metrics for multimodal search.
Propose a multi-stage architecture for retrieval, ranking, and re-ranking at this scale.
Choose model families for each stage and explain how text, image, and metadata signals are combined.
Design the offline and online data pipelines, including feature storage, training cadence, and index refresh.
Define offline evaluation, online experimentation, and monitoring for drift, skew, and quality regressions.
Identify key failure modes, fallback behavior, and cost/latency tradeoffs.

Constraints

The system must support text-only, image-only, and text+image queries in one API.
Raw image models larger than 1 GB cannot run synchronously per request for cost reasons.
Newly added products should become searchable within 15 minutes.
The marketplace has strict policy filters: blocked brands, unsafe content, and region-specific compliance rules must be enforced before final ranking.
Mobile clients are sensitive to tail latency; p99 above 250ms causes measurable search abandonment.