Model Historical Team Membership Changes

Context

You’re interviewing with the Revenue Operations Data Platform team at CloudDesk, a B2B SaaS company selling customer support software to mid-market and enterprise clients. CloudDesk has ~120K paying business customers, ~2.5M monthly active end-users, and ~18K internal sales/support reps (“Repls”) who are organized into Teams (e.g., SMB East, Enterprise Strategic, Renewals). Team membership changes frequently due to org changes, reassignments, and temporary coverage.

The business relies on accurate historical reporting for: (1) commission and quota attainment (finance audit risk), (2) SLA compliance for customer escalations (contractual penalties), and (3) pipeline attribution (revenue forecasting). Today, analytics in Snowflake uses a single “current state” table for team membership. This causes incorrect historical metrics (e.g., a rep moved teams last week, and all prior deals now appear under the new team).

CloudDesk’s operational sources are:

Salesforce (opportunities, accounts, rep assignments)
Internal Admin Service (team definitions, membership changes)
Okta (user identity lifecycle; hires/terminations)
Zendesk (tickets assigned to reps)

Your task is to design a production-grade data model + ETL/ELT pipeline to represent the relationship between Users, Teams, and Repls (a “Repl” is an internal user with a sales/support role) in a way that supports point-in-time and historical reporting, while handling late-arriving events, backfills, and data quality.

Scale Requirements

Membership change events: avg 50K/day, peak 10K/hour during reorganizations
Repls: ~18K active, ~60K historical
Teams: ~2K active, ~10K historical (renames, merges, splits)
Downstream facts:
- Opportunities: ~80M rows historical
- Tickets: ~2B events historical
Freshness: membership changes queryable in Snowflake within 15 minutes (near-real-time acceptable)
Correctness: point-in-time attribution must be correct for any date in the last 7 years
Retention: keep all history indefinitely (SOX-like audit expectations)

Data Characteristics & Challenges

Operational event payload (Admin Service)

Membership changes are emitted as JSON events to Kafka:

Field	Example	Notes
event_id	"7f9…"	globally unique
event_type	"MEMBERSHIP_ADDED"	also REMOVED, TEAM_RENAMED, TEAM_MERGED
repl_id	"r_123"	nullable for team-only events
team_id	"t_456"	team affected
effective_at	"2026-01-10T09:00:00Z"	business-effective time
emitted_at	"2026-01-10T09:03:12Z"	ingestion time
actor	"admin@…"	who made the change

Known issues

Late-arriving events: effective_at can be backdated by up to 30 days (HR corrections).
Out-of-order delivery: Kafka ordering is only guaranteed per partition; some events arrive out of order.
Duplicates: retries can produce duplicate event_id or semantically duplicate events.
Team merges/splits: a merge can retroactively re-map membership; you must preserve original history and also support “as-of” truth.
Identity mapping: Okta user_id ↔ internal repl_id mapping can change (contractors converted to FTE).

Requirements

Functional

Model Users, Repls, Teams such that analysts can answer:
- “Which team was repl X in on 2025-11-15?”
- “For each opportunity close date, attribute revenue to the rep’s team at that time.”
- “How many reps were in Team Y each day last quarter?”
Support many-to-many over time (a repl can belong to multiple teams simultaneously for coverage, with primary/secondary roles).
Support team hierarchy (team → parent team) with history (reorgs).
Provide slowly changing dimension semantics with explicit validity windows.
Enable backfills and reprocessing when late events arrive.

Non-functional

Idempotent loads and deterministic outputs.
Auditable lineage: every derived row should be traceable to source events.
Data quality: enforce referential integrity, no overlapping validity windows for the same repl/team/role.
Performance: common attribution queries over 2B tickets should run in <30s with proper pruning.

Constraints

Primary warehouse is Snowflake; transformations are done with dbt.
Orchestration is Apache Airflow 2.x.
Streaming is available via Kafka; you may use Spark Structured Streaming for near-real-time ingestion.
Budget: avoid always-on large clusters; prefer micro-batch or serverless patterns.
Compliance: produce an audit trail suitable for finance review; soft deletes must be represented (do not physically delete history).

What you should deliver (as the candidate)

Propose the dimensional model (tables, keys, validity columns, constraints) for Users/Repls/Teams membership history.
Describe the ETL/ELT pipeline from Kafka/Admin Service + Salesforce + Okta into Snowflake, including how you handle late/out-of-order events.
Show how downstream fact tables (opportunities, tickets) will be attributed to team membership as-of event time.
Define data quality checks and operational monitoring.
Explain how you’d backfill 7 years of history and keep it correct going forward.

Context

CloudDesk’s operational sources are:

Salesforce (opportunities, accounts, rep assignments)
Internal Admin Service (team definitions, membership changes)
Okta (user identity lifecycle; hires/terminations)
Zendesk (tickets assigned to reps)

Scale Requirements

Membership change events: avg 50K/day, peak 10K/hour during reorganizations
Repls: ~18K active, ~60K historical
Teams: ~2K active, ~10K historical (renames, merges, splits)
Downstream facts:
- Opportunities: ~80M rows historical
- Tickets: ~2B events historical
Freshness: membership changes queryable in Snowflake within 15 minutes (near-real-time acceptable)
Correctness: point-in-time attribution must be correct for any date in the last 7 years
Retention: keep all history indefinitely (SOX-like audit expectations)

Data Characteristics & Challenges

Operational event payload (Admin Service)

Membership changes are emitted as JSON events to Kafka:

Field	Example	Notes
event_id	"7f9…"	globally unique
event_type	"MEMBERSHIP_ADDED"	also REMOVED, TEAM_RENAMED, TEAM_MERGED
repl_id	"r_123"	nullable for team-only events
team_id	"t_456"	team affected
effective_at	"2026-01-10T09:00:00Z"	business-effective time
emitted_at	"2026-01-10T09:03:12Z"	ingestion time
actor	"admin@…"	who made the change

Known issues

Late-arriving events: effective_at can be backdated by up to 30 days (HR corrections).
Out-of-order delivery: Kafka ordering is only guaranteed per partition; some events arrive out of order.
Duplicates: retries can produce duplicate event_id or semantically duplicate events.
Team merges/splits: a merge can retroactively re-map membership; you must preserve original history and also support “as-of” truth.
Identity mapping: Okta user_id ↔ internal repl_id mapping can change (contractors converted to FTE).

Requirements

Functional

Model Users, Repls, Teams such that analysts can answer:
- “Which team was repl X in on 2025-11-15?”
- “For each opportunity close date, attribute revenue to the rep’s team at that time.”
- “How many reps were in Team Y each day last quarter?”
Support many-to-many over time (a repl can belong to multiple teams simultaneously for coverage, with primary/secondary roles).
Support team hierarchy (team → parent team) with history (reorgs).
Provide slowly changing dimension semantics with explicit validity windows.
Enable backfills and reprocessing when late events arrive.

Non-functional

Idempotent loads and deterministic outputs.
Auditable lineage: every derived row should be traceable to source events.
Data quality: enforce referential integrity, no overlapping validity windows for the same repl/team/role.
Performance: common attribution queries over 2B tickets should run in <30s with proper pruning.

Constraints

Primary warehouse is Snowflake; transformations are done with dbt.
Orchestration is Apache Airflow 2.x.
Streaming is available via Kafka; you may use Spark Structured Streaming for near-real-time ingestion.
Budget: avoid always-on large clusters; prefer micro-batch or serverless patterns.
Compliance: produce an audit trail suitable for finance review; soft deletes must be represented (do not physically delete history).

What you should deliver (as the candidate)

Propose the dimensional model (tables, keys, validity columns, constraints) for Users/Repls/Teams membership history.
Describe the ETL/ELT pipeline from Kafka/Admin Service + Salesforce + Okta into Snowflake, including how you handle late/out-of-order events.
Show how downstream fact tables (opportunities, tickets) will be attributed to team membership as-of event time.
Define data quality checks and operational monitoring.
Explain how you’d backfill 7 years of history and keep it correct going forward.

Context

CloudDesk’s operational sources are:

Salesforce (opportunities, accounts, rep assignments)
Internal Admin Service (team definitions, membership changes)
Okta (user identity lifecycle; hires/terminations)
Zendesk (tickets assigned to reps)

Scale Requirements

Membership change events: avg 50K/day, peak 10K/hour during reorganizations
Repls: ~18K active, ~60K historical
Teams: ~2K active, ~10K historical (renames, merges, splits)
Downstream facts:
- Opportunities: ~80M rows historical
- Tickets: ~2B events historical
Freshness: membership changes queryable in Snowflake within 15 minutes (near-real-time acceptable)
Correctness: point-in-time attribution must be correct for any date in the last 7 years
Retention: keep all history indefinitely (SOX-like audit expectations)

Data Characteristics & Challenges

Operational event payload (Admin Service)

Membership changes are emitted as JSON events to Kafka:

Field	Example	Notes
event_id	"7f9…"	globally unique
event_type	"MEMBERSHIP_ADDED"	also REMOVED, TEAM_RENAMED, TEAM_MERGED
repl_id	"r_123"	nullable for team-only events
team_id	"t_456"	team affected
effective_at	"2026-01-10T09:00:00Z"	business-effective time
emitted_at	"2026-01-10T09:03:12Z"	ingestion time
actor	"admin@…"	who made the change

Known issues

Late-arriving events: effective_at can be backdated by up to 30 days (HR corrections).
Out-of-order delivery: Kafka ordering is only guaranteed per partition; some events arrive out of order.
Duplicates: retries can produce duplicate event_id or semantically duplicate events.
Team merges/splits: a merge can retroactively re-map membership; you must preserve original history and also support “as-of” truth.
Identity mapping: Okta user_id ↔ internal repl_id mapping can change (contractors converted to FTE).

Requirements

Functional

Model Users, Repls, Teams such that analysts can answer:
- “Which team was repl X in on 2025-11-15?”
- “For each opportunity close date, attribute revenue to the rep’s team at that time.”
- “How many reps were in Team Y each day last quarter?”
Support many-to-many over time (a repl can belong to multiple teams simultaneously for coverage, with primary/secondary roles).
Support team hierarchy (team → parent team) with history (reorgs).
Provide slowly changing dimension semantics with explicit validity windows.
Enable backfills and reprocessing when late events arrive.

Non-functional

Idempotent loads and deterministic outputs.
Auditable lineage: every derived row should be traceable to source events.
Data quality: enforce referential integrity, no overlapping validity windows for the same repl/team/role.
Performance: common attribution queries over 2B tickets should run in <30s with proper pruning.

Constraints

Primary warehouse is Snowflake; transformations are done with dbt.
Orchestration is Apache Airflow 2.x.
Streaming is available via Kafka; you may use Spark Structured Streaming for near-real-time ingestion.
Budget: avoid always-on large clusters; prefer micro-batch or serverless patterns.
Compliance: produce an audit trail suitable for finance review; soft deletes must be represented (do not physically delete history).

What you should deliver (as the candidate)

Propose the dimensional model (tables, keys, validity columns, constraints) for Users/Repls/Teams membership history.
Describe the ETL/ELT pipeline from Kafka/Admin Service + Salesforce + Okta into Snowflake, including how you handle late/out-of-order events.
Show how downstream fact tables (opportunities, tickets) will be attributed to team membership as-of event time.
Define data quality checks and operational monitoring.
Explain how you’d backfill 7 years of history and keep it correct going forward.

Context

CloudDesk’s operational sources are:

Salesforce (opportunities, accounts, rep assignments)
Internal Admin Service (team definitions, membership changes)
Okta (user identity lifecycle; hires/terminations)
Zendesk (tickets assigned to reps)

Scale Requirements

Membership change events: avg 50K/day, peak 10K/hour during reorganizations
Repls: ~18K active, ~60K historical
Teams: ~2K active, ~10K historical (renames, merges, splits)
Downstream facts:
- Opportunities: ~80M rows historical
- Tickets: ~2B events historical
Freshness: membership changes queryable in Snowflake within 15 minutes (near-real-time acceptable)
Correctness: point-in-time attribution must be correct for any date in the last 7 years
Retention: keep all history indefinitely (SOX-like audit expectations)

Data Characteristics & Challenges

Operational event payload (Admin Service)

Membership changes are emitted as JSON events to Kafka:

Field	Example	Notes
event_id	"7f9…"	globally unique
event_type	"MEMBERSHIP_ADDED"	also REMOVED, TEAM_RENAMED, TEAM_MERGED
repl_id	"r_123"	nullable for team-only events
team_id	"t_456"	team affected
effective_at	"2026-01-10T09:00:00Z"	business-effective time
emitted_at	"2026-01-10T09:03:12Z"	ingestion time
actor	"admin@…"	who made the change

Known issues

Late-arriving events: effective_at can be backdated by up to 30 days (HR corrections).
Out-of-order delivery: Kafka ordering is only guaranteed per partition; some events arrive out of order.
Duplicates: retries can produce duplicate event_id or semantically duplicate events.
Team merges/splits: a merge can retroactively re-map membership; you must preserve original history and also support “as-of” truth.
Identity mapping: Okta user_id ↔ internal repl_id mapping can change (contractors converted to FTE).

Requirements

Functional

Model Users, Repls, Teams such that analysts can answer:
- “Which team was repl X in on 2025-11-15?”
- “For each opportunity close date, attribute revenue to the rep’s team at that time.”
- “How many reps were in Team Y each day last quarter?”
Support many-to-many over time (a repl can belong to multiple teams simultaneously for coverage, with primary/secondary roles).
Support team hierarchy (team → parent team) with history (reorgs).
Provide slowly changing dimension semantics with explicit validity windows.
Enable backfills and reprocessing when late events arrive.

Non-functional

Idempotent loads and deterministic outputs.
Auditable lineage: every derived row should be traceable to source events.
Data quality: enforce referential integrity, no overlapping validity windows for the same repl/team/role.
Performance: common attribution queries over 2B tickets should run in <30s with proper pruning.

Constraints

Primary warehouse is Snowflake; transformations are done with dbt.
Orchestration is Apache Airflow 2.x.
Streaming is available via Kafka; you may use Spark Structured Streaming for near-real-time ingestion.
Budget: avoid always-on large clusters; prefer micro-batch or serverless patterns.
Compliance: produce an audit trail suitable for finance review; soft deletes must be represented (do not physically delete history).

What you should deliver (as the candidate)

Propose the dimensional model (tables, keys, validity columns, constraints) for Users/Repls/Teams membership history.
Describe the ETL/ELT pipeline from Kafka/Admin Service + Salesforce + Okta into Snowflake, including how you handle late/out-of-order events.
Show how downstream fact tables (opportunities, tickets) will be attributed to team membership as-of event time.
Define data quality checks and operational monitoring.
Explain how you’d backfill 7 years of history and keep it correct going forward.

Interview Guides

Context

Scale Requirements

Data Characteristics & Challenges

Operational event payload (Admin Service)

Known issues

Requirements

Functional

Non-functional

Constraints

What you should deliver (as the candidate)

Model Historical Team Membership Changes

Context

Scale Requirements

Data Characteristics & Challenges

Operational event payload (Admin Service)

Known issues

Requirements

Functional

Non-functional

Constraints

What you should deliver (as the candidate)

Your Answer

Model Historical Team Membership Changes

Context

Scale Requirements

Data Characteristics & Challenges

Operational event payload (Admin Service)

Known issues

Requirements

Functional

Non-functional

Constraints

What you should deliver (as the candidate)

Model Historical Team Membership Changes

Context

Scale Requirements

Data Characteristics & Challenges

Operational event payload (Admin Service)

Known issues

Requirements

Functional

Non-functional

Constraints

What you should deliver (as the candidate)

Your Answer