AI Agent Usage Credits Data Model

Context

You’re interviewing with FinPilot, a fintech SaaS platform that provides automated bookkeeping and cash-flow forecasting for ~120K businesses (SMBs + mid-market). FinPilot has launched a new AI Agent feature that can: (1) reconcile transactions, (2) draft invoices, (3) answer “why did my cash drop?” questions, and (4) generate monthly close checklists. The agent is monetized via credits: customers buy credit packs and each agent run consumes credits based on model, tokens, tools used, and runtime.

Today, product analytics is stitched together from application logs and a daily billing export. This has caused multiple incidents: customers disputing invoices because retries double-counted credits, the support team lacking a consistent view of “what happened in this run,” and engineering being blind to rising tool-call error rates that correlate with a specific model rollout. Leadership wants a single source of truth for agent usage that supports: billing accuracy, near-real-time monitoring, and self-serve analytics.

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Context

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Context

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Context

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Interview Guides

Context

Scale & Freshness Requirements

Data Characteristics & Quality Issues

Your Task

Functional Requirements

Non-Functional Requirements

Constraints

Deliverables (What the interviewer expects)

AI Agent Usage Credits Data Model

Context

Scale & Freshness Requirements

Data Characteristics & Quality Issues

Your Task

Functional Requirements

Non-Functional Requirements

Constraints

Deliverables (What the interviewer expects)

Your Answer

AI Agent Usage Credits Data Model

Context

Scale & Freshness Requirements

Data Characteristics & Quality Issues

Your Task

Functional Requirements

Non-Functional Requirements

Constraints

Deliverables (What the interviewer expects)

AI Agent Usage Credits Data Model

Context

Scale & Freshness Requirements

Data Characteristics & Quality Issues

Your Task

Functional Requirements

Non-Functional Requirements

Constraints

Deliverables (What the interviewer expects)

Your Answer