Index Strategy for Fintech Ledger Queries

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

A recent incident caused reconciliation queries to slow down by 10× after a schema change and a new index was added. The on-call engineer suspects the team chose the wrong index type and key order, causing excessive page splits and random I/O.

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Interview Guides

Context

Core Question

Scope Guidance (what a strong answer includes)

Index Strategy for Fintech Ledger Queries

Context

Core Question

Scope Guidance (what a strong answer includes)

Your Answer

Index Strategy for Fintech Ledger Queries

Context

Core Question

Scope Guidance (what a strong answer includes)

Index Strategy for Fintech Ledger Queries

Context

Core Question

Scope Guidance (what a strong answer includes)

Your Answer