Interview Guides

Index Strategy for Fintech Ledger Queries | Dataford Interview Questions - Dataford - Ace your Interview

Index Strategy for Fintech Ledger Queries

Medium

SQL & Data Manipulation

JoinsAggregationsSubqueriesGroup ByData Wrangling

Problem

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

A recent incident caused reconciliation queries to slow down by 10× after a schema change and a new index was added. The on-call engineer suspects the team chose the wrong index type and key order, causing excessive page splits and random I/O.

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Key Concepts

Clustered index (data is the index)

A clustered index defines the physical order of rows in the table (at the storage/page level). Because the leaf level of a clustered index contains the full row, range scans on the clustered key are typically efficient.

CREATE CLUSTERED INDEX CX_ledger_posted_at
ON ledger_transactions(posted_at, transaction_id);

Non-clustered index (separate structure)

A non-clustered index is a separate B-tree whose leaf level stores the index key plus a row locator. The row locator is either a RID (heap) or the clustered key (clustered table), which affects lookup cost.

CREATE NONCLUSTERED INDEX IX_ledger_transaction_id
ON ledger_transactions(transaction_id)
INCLUDE (account_id, amount_cents, posted_at);

Key lookups vs covering indexes

If a non-clustered index doesn’t contain all columns needed by a query, the engine may do a key lookup back to the base table (heap or clustered index) for each matching row. Adding INCLUDE columns can make the index covering and avoid many random I/Os.

SELECT amount_cents, posted_at
FROM ledger_transactions
WHERE transaction_id = 'tx_9f2...';

Range scans, fragmentation, and write amplification

Clustered keys that are not insert-friendly (e.g., random GUIDs) can cause frequent page splits and fragmentation, increasing write cost and hurting scan performance. Non-clustered indexes also add write overhead because each write may update multiple index trees.

Heap vs clustered table row locators

On a heap, non-clustered indexes point to a physical row identifier (RID), which can become unstable under row movement. On a clustered table, non-clustered indexes point to the clustered key, which increases index size if the clustered key is wide.

Problem

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Key Concepts

Clustered index (data is the index)

CREATE CLUSTERED INDEX CX_ledger_posted_at
ON ledger_transactions(posted_at, transaction_id);

Non-clustered index (separate structure)

CREATE NONCLUSTERED INDEX IX_ledger_transaction_id
ON ledger_transactions(transaction_id)
INCLUDE (account_id, amount_cents, posted_at);

Key lookups vs covering indexes

SELECT amount_cents, posted_at
FROM ledger_transactions
WHERE transaction_id = 'tx_9f2...';

Range scans, fragmentation, and write amplification

Heap vs clustered table row locators

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Deduplicate Fintech Transaction LogsMedium Optimize Spark ETL for Ledger LoadsHard Thread-Safe Idempotency Key LedgerHard

Next question

Index Strategy for Fintech Ledger Queries

Medium

SQL & Data Manipulation

JoinsAggregationsSubqueriesGroup ByData Wrangling

Problem

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Key Concepts

Clustered index (data is the index)

CREATE CLUSTERED INDEX CX_ledger_posted_at
ON ledger_transactions(posted_at, transaction_id);

Non-clustered index (separate structure)

CREATE NONCLUSTERED INDEX IX_ledger_transaction_id
ON ledger_transactions(transaction_id)
INCLUDE (account_id, amount_cents, posted_at);

Key lookups vs covering indexes

SELECT amount_cents, posted_at
FROM ledger_transactions
WHERE transaction_id = 'tx_9f2...';

Range scans, fragmentation, and write amplification

Heap vs clustered table row locators

Problem

Context

You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:

Real-time risk checks (p95 latency targets under 200ms)
Customer support tooling (lookups by transaction id)
Finance reconciliation jobs (range scans by posting date)

Core Question

Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:

Physical layout: How does each index type affect how rows are stored on disk / in pages?
Lookup mechanics: What does a non-clustered index “point to” (heap vs clustered key), and why does that matter?
Performance trade-offs: Compare read patterns (point lookups vs range scans) and write costs (insert/update/delete, page splits, fragmentation).
Query-driven design: Given typical ledger queries (by transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?
Edge cases: What changes when the table is a heap, when the clustered key is non-unique, or when you need covering indexes (INCLUDE columns)?

Scope Guidance (what a strong answer includes)

Use concrete examples of queries and explain which index would be used and why.
Discuss trade-offs rather than presenting one index type as universally better.
Call out common misconceptions (e.g., “clustered indexes are always faster”).
Mention at least one real-world operational concern (e.g., fragmentation, fill factor, hot spots, or index maintenance).

Key Concepts

Clustered index (data is the index)

CREATE CLUSTERED INDEX CX_ledger_posted_at
ON ledger_transactions(posted_at, transaction_id);

Non-clustered index (separate structure)

CREATE NONCLUSTERED INDEX IX_ledger_transaction_id
ON ledger_transactions(transaction_id)
INCLUDE (account_id, amount_cents, posted_at);

Key lookups vs covering indexes

SELECT amount_cents, posted_at
FROM ledger_transactions
WHERE transaction_id = 'tx_9f2...';

Range scans, fragmentation, and write amplification

Heap vs clustered table row locators

Your answer

Try one AI text evaluation on us

Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.

0 wordstarget ~200

Up next

Deduplicate Fintech Transaction LogsMedium Optimize Spark ETL for Ledger LoadsHard Thread-Safe Idempotency Key LedgerHard

Next question