
At a firm like Northstar Capital, a portfolio manager may see a sudden jump in an asset's risk score on a dashboard and ask whether the spike reflects real market movement or bad underlying data.
Explain how you would investigate the underlying data using SQL. Your answer should cover:
The interviewer expects a practical investigation framework, not just a single query. Discuss the sequence of checks, the SQL techniques you would use in PostgreSQL, and how you would distinguish a true business event from a data pipeline or modeling issue.
The first step is to compare the current risk value to recent history so you can confirm whether the spike is truly abnormal. In PostgreSQL, this is commonly done with date filters, rolling summaries, or window functions such as LAG to compare today's value with prior observations.
SELECT asset_id, risk_date, risk_score,
LAG(risk_score) OVER (PARTITION BY asset_id ORDER BY risk_date) AS prior_risk
FROM asset_risk_history;
Risk metrics are usually derived from multiple sources such as positions, prices, factor exposures, and reference data. Joining these tables lets you verify whether the spike is driven by a position change, a market data move, or a mismatch in asset mapping.
SELECT r.asset_id, r.risk_date, p.position_qty, m.price
FROM asset_risk_history r
LEFT JOIN positions p
ON r.asset_id = p.asset_id AND r.risk_date = p.position_date
LEFT JOIN market_prices m
ON r.asset_id = m.asset_id AND r.risk_date = m.price_date;
Many unexplained spikes come from bad inputs rather than real changes. SQL checks for NULLs, duplicate rows, stale timestamps, and extreme values help determine whether the dashboard is surfacing a data issue instead of a genuine risk event.
SELECT asset_id, price_date, COUNT(*) AS row_count
FROM market_prices
GROUP BY asset_id, price_date
HAVING COUNT(*) > 1;
A portfolio-level or asset-level risk number often needs to be broken into components to find the driver. Aggregating by factor, desk, source system, or calculation component helps isolate which part of the pipeline changed materially.
SELECT asset_id, factor_name, SUM(factor_contribution) AS total_contribution
FROM risk_factor_contributions
WHERE asset_id = 101
GROUP BY asset_id, factor_name
ORDER BY total_contribution DESC;
CTEs are useful for organizing a multi-step investigation into readable stages: identify the spike date, compare prior values, inspect source inputs, and summarize anomalies. This makes the logic easier to explain in an interview and easier to maintain in production analysis.
WITH latest_spike AS (
SELECT asset_id, risk_date, risk_score
FROM asset_risk_history
WHERE asset_id = 101
)
SELECT *
FROM latest_spike;