AI Agent Usage Credits Data Model

Hard

Pipelines

Asked at 1 company1Data ModelingETL

Also asked at

Problem

Context

You’re interviewing with FinPilot, a fintech SaaS platform that provides automated bookkeeping and cash-flow forecasting for ~120K businesses (SMBs + mid-market). FinPilot has launched a new AI Agent feature that can: (1) reconcile transactions, (2) draft invoices, (3) answer “why did my cash drop?” questions, and (4) generate monthly close checklists. The agent is monetized via credits: customers buy credit packs and each agent run consumes credits based on model, tokens, tools used, and runtime.

Today, product analytics is stitched together from application logs and a daily billing export. This has caused multiple incidents: customers disputing invoices because retries double-counted credits, the support team lacking a consistent view of “what happened in this run,” and engineering being blind to rising tool-call error rates that correlate with a specific model rollout. Leadership wants a single source of truth for agent usage that supports: billing accuracy, near-real-time monitoring, and self-serve analytics.

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Problem

Context

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Quantify Impact of AI Support AgentMedium Define AI Agent Retention MetricsMedium

Design a multi-agent enterprise analytics assistant on Databricks with governance and cost controlsHard

Next question

Context

Scale & Freshness Requirements

Traffic: ~2.5M agent runs/day average, 10M/day peak during month-end close.
Event volume: Each run emits 10–200 events (start, tool calls, token usage, retries, errors, finish). Peak ~50K events/sec.
Latency:
- Operational dashboards (errors, latency, credit burn): P95 < 2 minutes end-to-end.
- Finance/billing correctness: daily close with T+1 finalization, but must support late-arriving events up to 7 days.
Storage: retain raw events 90 days; curated fact tables 2+ years.
Correctness: credits must be exactly-once billed per run, even with retries and partial failures.

Data Characteristics & Quality Issues

Events come from multiple producers:

Agent Orchestrator service (Kubernetes): emits lifecycle events (run_started, run_completed, run_failed).
LLM Gateway: emits token usage (prompt_tokens, completion_tokens), model name/version, and request IDs.
Tool execution service: emits tool_call_started/tool_call_finished, tool name, latency, and error codes.
Billing service: emits credit ledger mutations (credit_granted, credit_reserved, credit_charged, credit_refunded).

Common issues you must design for:

Retries: same run_id may have multiple attempts; tool calls can be retried with the same idempotency key.
Out-of-order delivery: tool events can arrive before run_started due to buffering.
Schema evolution: new tools and new models add fields frequently.
PII/regulated data: prompts may contain customer financial details; raw prompts must not land in analytics without redaction.

Your Task

Design a complete pipeline + data model to track AI Agent usage, credit consumption, and error rates. Your design must support both near-real-time monitoring and billing-grade accuracy.

Functional Requirements

Canonical event model for agent usage that supports lifecycle, attempts, tool calls, token usage, and errors.
Credit accounting model that ties credit charges/refunds to a run and prevents double billing.
Error-rate analytics by tool, model, customer tier, and agent workflow, with drill-down to run/attempt.
Late-arriving and out-of-order events: correct aggregates when events arrive up to 7 days late.
Idempotent ingestion: duplicates must not inflate usage or credits.
Support backfills for historical reprocessing (e.g., model pricing change, bug fix in token accounting).

Non-Functional Requirements

Auditability: finance can trace an invoice line item back to the raw events and ledger entries.
Data privacy: do not store raw prompts; store hashes/metadata only; enforce least privilege.
High availability: pipeline should tolerate a Kafka broker outage and a streaming job restart without data loss.
Cost control: keep incremental platform spend under $40K/month.

Constraints

Cloud: AWS. Warehouse: Snowflake (already used by Finance).
Existing tools: Kafka (MSK), Airflow 2.x, dbt, S3.
Team: 5 data engineers; strong SQL/dbt skills, moderate Spark experience.
Compliance: SOC2; prompts and tool payloads may contain sensitive financial info.

Deliverables (What the interviewer expects)

A proposed warehouse schema (facts/dimensions) with keys, grain, and example columns.
How you’ll compute:
- credits consumed per run/day/customer
- error rates (by tool/model/workflow)
- token usage and cost attribution
How you’ll handle deduplication, retries, late events, and backfills.
Monitoring, alerting, and failure recovery strategies.