Context
HireFlow, a SaaS applicant tracking platform used by mid-market employers, stores operational ATS data in PostgreSQL and exposes limited reporting through ad hoc queries. Recruiting leaders now want a centralized analytics model for funnel conversion, recruiter productivity, source attribution, and SLA tracking across applications, interviews, offers, and hires.
You need to design the data model and supporting pipelines that move ATS data from transactional systems and event streams into an analytics warehouse while preserving historical state changes.
Scale Requirements
- Customers: 2,500 companies, 120K recruiters/hiring managers
- Volume: 8M candidates, 35M applications, 400M stage transition events, 1.2B activity log records
- Ingestion: ~2K CDC row changes/sec average, 15K/sec peak during business hours
- Latency: Core funnel dashboards < 15 minutes; executive aggregates hourly
- Retention: 7 years for audit history, 30 days for raw replay topics
- Storage: ~25 TB compressed warehouse footprint
Requirements
- Design an analytics-ready ATS data model covering candidates, jobs, applications, application stages, interviews, offers, recruiters, hiring teams, and source channels.
- Preserve slowly changing history for statuses such as application stage, offer state, and recruiter ownership.
- Support both current-state reporting and historical funnel reconstruction by day/week/month.
- Build incremental ELT pipelines from PostgreSQL CDC and application event logs into the warehouse.
- Define primary keys, surrogate keys, late-arriving event handling, deduplication, and idempotent backfills.
- Include data quality checks for referential integrity, duplicate applications, invalid stage transitions, and missing timestamps.
- Expose marts for recruiting funnel analytics, time-to-hire, interviewer utilization, and source effectiveness.
Constraints
- Existing stack is AWS-based and the company already uses Airflow and dbt.
- Budget allows managed services but not a large custom platform team.
- PII exists in candidate records; analytics consumers should access masked or role-restricted fields.
- Compliance requirements include GDPR deletion workflows and immutable auditability for hiring decisions.
Be prepared to explain your schema choices (star schema vs. normalized core), incremental loading strategy, orchestration design, and monitoring plan.