Schedule Recurring Design File Jobs

Context

PixelForge, a collaborative design platform, stores uploaded design files in Amazon S3 and runs downstream processing to generate thumbnails, extract metadata, and build search indexes. Today, these jobs are triggered manually or by ad hoc cron scripts on EC2, causing missed runs, duplicate processing, and poor visibility into failures.

You need to design a recurring batch pipeline that schedules and orchestrates processing jobs for newly uploaded and updated design files while supporting retries, backfills, and operational monitoring.

Scale Requirements

Input volume: 8M design files total, 250K new or updated files/day
File size: 5 MB average, 200 MB max
Job frequency: Every 15 minutes for incremental processing; nightly full reconciliation
Latency target: New files processed and queryable within 30 minutes of upload
Storage: 40 TB raw files in S3, 2 TB metadata in warehouse
Availability: 99.9% successful scheduled runs per month

Requirements

Design a scheduler for recurring incremental jobs that discovers new or changed files since the last successful run.
Orchestrate dependent steps: file discovery, metadata extraction, thumbnail rendering, quality validation, and warehouse load.
Ensure idempotent re-runs so retries or backfills do not create duplicate metadata or duplicate thumbnails.
Support backfilling a date range when a downstream system is unavailable for several hours.
Track job state, run history, and per-step success/failure for operators.
Load analytics-ready metadata into Snowflake for downstream reporting.
Include monitoring, alerting, and failure recovery for delayed or failed schedules.

Constraints

Existing stack is AWS-first: S3, ECS, Snowflake, and CloudWatch are already approved.
Team size is 3 data engineers; avoid overly complex self-managed infrastructure.
Budget for new orchestration infrastructure is limited to $15K/month.
Some files contain customer IP and must remain in AWS with audit logs retained for 1 year.

Context

Scale Requirements

Input volume: 8M design files total, 250K new or updated files/day
File size: 5 MB average, 200 MB max
Job frequency: Every 15 minutes for incremental processing; nightly full reconciliation
Latency target: New files processed and queryable within 30 minutes of upload
Storage: 40 TB raw files in S3, 2 TB metadata in warehouse
Availability: 99.9% successful scheduled runs per month

Requirements

Design a scheduler for recurring incremental jobs that discovers new or changed files since the last successful run.
Orchestrate dependent steps: file discovery, metadata extraction, thumbnail rendering, quality validation, and warehouse load.
Ensure idempotent re-runs so retries or backfills do not create duplicate metadata or duplicate thumbnails.
Support backfilling a date range when a downstream system is unavailable for several hours.
Track job state, run history, and per-step success/failure for operators.
Load analytics-ready metadata into Snowflake for downstream reporting.
Include monitoring, alerting, and failure recovery for delayed or failed schedules.

Constraints

Existing stack is AWS-first: S3, ECS, Snowflake, and CloudWatch are already approved.
Team size is 3 data engineers; avoid overly complex self-managed infrastructure.
Budget for new orchestration infrastructure is limited to $15K/month.
Some files contain customer IP and must remain in AWS with audit logs retained for 1 year.

Context

Scale Requirements

Input volume: 8M design files total, 250K new or updated files/day
File size: 5 MB average, 200 MB max
Job frequency: Every 15 minutes for incremental processing; nightly full reconciliation
Latency target: New files processed and queryable within 30 minutes of upload
Storage: 40 TB raw files in S3, 2 TB metadata in warehouse
Availability: 99.9% successful scheduled runs per month

Requirements

Design a scheduler for recurring incremental jobs that discovers new or changed files since the last successful run.
Orchestrate dependent steps: file discovery, metadata extraction, thumbnail rendering, quality validation, and warehouse load.
Ensure idempotent re-runs so retries or backfills do not create duplicate metadata or duplicate thumbnails.
Support backfilling a date range when a downstream system is unavailable for several hours.
Track job state, run history, and per-step success/failure for operators.
Load analytics-ready metadata into Snowflake for downstream reporting.
Include monitoring, alerting, and failure recovery for delayed or failed schedules.

Constraints

Existing stack is AWS-first: S3, ECS, Snowflake, and CloudWatch are already approved.
Team size is 3 data engineers; avoid overly complex self-managed infrastructure.
Budget for new orchestration infrastructure is limited to $15K/month.
Some files contain customer IP and must remain in AWS with audit logs retained for 1 year.

Context

Scale Requirements

Input volume: 8M design files total, 250K new or updated files/day
File size: 5 MB average, 200 MB max
Job frequency: Every 15 minutes for incremental processing; nightly full reconciliation
Latency target: New files processed and queryable within 30 minutes of upload
Storage: 40 TB raw files in S3, 2 TB metadata in warehouse
Availability: 99.9% successful scheduled runs per month

Requirements

Design a scheduler for recurring incremental jobs that discovers new or changed files since the last successful run.
Orchestrate dependent steps: file discovery, metadata extraction, thumbnail rendering, quality validation, and warehouse load.
Ensure idempotent re-runs so retries or backfills do not create duplicate metadata or duplicate thumbnails.
Support backfilling a date range when a downstream system is unavailable for several hours.
Track job state, run history, and per-step success/failure for operators.
Load analytics-ready metadata into Snowflake for downstream reporting.
Include monitoring, alerting, and failure recovery for delayed or failed schedules.

Constraints

Existing stack is AWS-first: S3, ECS, Snowflake, and CloudWatch are already approved.
Team size is 3 data engineers; avoid overly complex self-managed infrastructure.
Budget for new orchestration infrastructure is limited to $15K/month.
Some files contain customer IP and must remain in AWS with audit logs retained for 1 year.

Interview Guides

Context

Scale Requirements

Requirements

Constraints

Schedule Recurring Design File Jobs

Context

Scale Requirements

Requirements

Constraints

Your Answer

Schedule Recurring Design File Jobs

Context

Scale Requirements

Requirements

Constraints

Schedule Recurring Design File Jobs

Context

Scale Requirements

Requirements

Constraints

Your Answer