You’re joining the data platform team at a fintech that processes 20–40 million card and ACH transactions per day. The company maintains an append-heavy ledger table used by:
A recent incident caused reconciliation queries to slow down by 10× after a schema change and a new index was added. The on-call engineer suspects the team chose the wrong index type and key order, causing excessive page splits and random I/O.
Explain the difference between a clustered index and a non-clustered index. In your answer, address the following:
transaction_id, by account_id + date range, and by posted_at range), which columns are good candidates for clustered vs non-clustered indexes?A clustered index defines the physical order of rows in the table (at the storage/page level). Because the leaf level of a clustered index contains the full row, range scans on the clustered key are typically efficient.
CREATE CLUSTERED INDEX CX_ledger_posted_at
ON ledger_transactions(posted_at, transaction_id);
A non-clustered index is a separate B-tree whose leaf level stores the index key plus a row locator. The row locator is either a RID (heap) or the clustered key (clustered table), which affects lookup cost.
CREATE NONCLUSTERED INDEX IX_ledger_transaction_id
ON ledger_transactions(transaction_id)
INCLUDE (account_id, amount_cents, posted_at);
If a non-clustered index doesn’t contain all columns needed by a query, the engine may do a key lookup back to the base table (heap or clustered index) for each matching row. Adding INCLUDE columns can make the index covering and avoid many random I/Os.
SELECT amount_cents, posted_at
FROM ledger_transactions
WHERE transaction_id = 'tx_9f2...';
Clustered keys that are not insert-friendly (e.g., random GUIDs) can cause frequent page splits and fragmentation, increasing write cost and hurting scan performance. Non-clustered indexes also add write overhead because each write may update multiple index trees.
On a heap, non-clustered indexes point to a physical row identifier (RID), which can become unstable under row movement. On a clustered table, non-clustered indexes point to the clustered key, which increases index size if the clustered key is wide.