Dataford
Interview Guides
Upgrade
All questions/SQL & Data Manipulation/Investigating Risk Spikes in SQL

Investigating Risk Spikes in SQL

Medium
SQL & Data Manipulation
Asked at 2 companies2JoinsAggregationsData Wrangling
Also asked at
AVoya Financial

Problem

Context

At a firm like Northstar Capital, a portfolio manager may see a sudden jump in an asset's risk score on a dashboard and ask whether the spike reflects real market movement or bad underlying data.

Core question

Explain how you would investigate the underlying data using SQL. Your answer should cover:

  1. How you would validate the spike against historical values
  2. Which tables you would join to trace the risk metric back to source inputs
  3. How you would check for missing, duplicated, stale, or outlier records
  4. How you would isolate whether the issue comes from market data, position data, or calculation logic

Scope guidance

The interviewer expects a practical investigation framework, not just a single query. Discuss the sequence of checks, the SQL techniques you would use in PostgreSQL, and how you would distinguish a true business event from a data pipeline or modeling issue.

Key Concepts

Baseline comparison with time-series SQL

The first step is to compare the current risk value to recent history so you can confirm whether the spike is truly abnormal. In PostgreSQL, this is commonly done with date filters, rolling summaries, or window functions such as LAG to compare today's value with prior observations.

SELECT asset_id, risk_date, risk_score,
       LAG(risk_score) OVER (PARTITION BY asset_id ORDER BY risk_date) AS prior_risk
FROM asset_risk_history;

Traceability through joins

Risk metrics are usually derived from multiple sources such as positions, prices, factor exposures, and reference data. Joining these tables lets you verify whether the spike is driven by a position change, a market data move, or a mismatch in asset mapping.

SELECT r.asset_id, r.risk_date, p.position_qty, m.price
FROM asset_risk_history r
LEFT JOIN positions p
  ON r.asset_id = p.asset_id AND r.risk_date = p.position_date
LEFT JOIN market_prices m
  ON r.asset_id = m.asset_id AND r.risk_date = m.price_date;

Data quality validation

Many unexplained spikes come from bad inputs rather than real changes. SQL checks for NULLs, duplicate rows, stale timestamps, and extreme values help determine whether the dashboard is surfacing a data issue instead of a genuine risk event.

SELECT asset_id, price_date, COUNT(*) AS row_count
FROM market_prices
GROUP BY asset_id, price_date
HAVING COUNT(*) > 1;

Aggregation and decomposition

A portfolio-level or asset-level risk number often needs to be broken into components to find the driver. Aggregating by factor, desk, source system, or calculation component helps isolate which part of the pipeline changed materially.

SELECT asset_id, factor_name, SUM(factor_contribution) AS total_contribution
FROM risk_factor_contributions
WHERE asset_id = 101
GROUP BY asset_id, factor_name
ORDER BY total_contribution DESC;

Structured investigation with CTEs

CTEs are useful for organizing a multi-step investigation into readable stages: identify the spike date, compare prior values, inspect source inputs, and summarize anomalies. This makes the logic easier to explain in an interview and easier to maintain in production analysis.

WITH latest_spike AS (
  SELECT asset_id, risk_date, risk_score
  FROM asset_risk_history
  WHERE asset_id = 101
)
SELECT *
FROM latest_spike;

Problem

Context

At a firm like Northstar Capital, a portfolio manager may see a sudden jump in an asset's risk score on a dashboard and ask whether the spike reflects real market movement or bad underlying data.

Core question

Explain how you would investigate the underlying data using SQL. Your answer should cover:

  1. How you would validate the spike against historical values
  2. Which tables you would join to trace the risk metric back to source inputs
  3. How you would check for missing, duplicated, stale, or outlier records
  4. How you would isolate whether the issue comes from market data, position data, or calculation logic

Scope guidance

The interviewer expects a practical investigation framework, not just a single query. Discuss the sequence of checks, the SQL techniques you would use in PostgreSQL, and how you would distinguish a true business event from a data pipeline or modeling issue.

Key Concepts

Baseline comparison with time-series SQL

The first step is to compare the current risk value to recent history so you can confirm whether the spike is truly abnormal. In PostgreSQL, this is commonly done with date filters, rolling summaries, or window functions such as LAG to compare today's value with prior observations.

SELECT asset_id, risk_date, risk_score,
       LAG(risk_score) OVER (PARTITION BY asset_id ORDER BY risk_date) AS prior_risk
FROM asset_risk_history;

Traceability through joins

Risk metrics are usually derived from multiple sources such as positions, prices, factor exposures, and reference data. Joining these tables lets you verify whether the spike is driven by a position change, a market data move, or a mismatch in asset mapping.

SELECT r.asset_id, r.risk_date, p.position_qty, m.price
FROM asset_risk_history r
LEFT JOIN positions p
  ON r.asset_id = p.asset_id AND r.risk_date = p.position_date
LEFT JOIN market_prices m
  ON r.asset_id = m.asset_id AND r.risk_date = m.price_date;

Data quality validation

Many unexplained spikes come from bad inputs rather than real changes. SQL checks for NULLs, duplicate rows, stale timestamps, and extreme values help determine whether the dashboard is surfacing a data issue instead of a genuine risk event.

SELECT asset_id, price_date, COUNT(*) AS row_count
FROM market_prices
GROUP BY asset_id, price_date
HAVING COUNT(*) > 1;

Aggregation and decomposition

A portfolio-level or asset-level risk number often needs to be broken into components to find the driver. Aggregating by factor, desk, source system, or calculation component helps isolate which part of the pipeline changed materially.

SELECT asset_id, factor_name, SUM(factor_contribution) AS total_contribution
FROM risk_factor_contributions
WHERE asset_id = 101
GROUP BY asset_id, factor_name
ORDER BY total_contribution DESC;

Structured investigation with CTEs

CTEs are useful for organizing a multi-step investigation into readable stages: identify the spike date, compare prior values, inspect source inputs, and summarize anomalies. This makes the logic easier to explain in an interview and easier to maintain in production analysis.

WITH latest_spike AS (
  SELECT asset_id, risk_date, risk_score
  FROM asset_risk_history
  WHERE asset_id = 101
)
SELECT *
FROM latest_spike;
Your answer
Try one AI text evaluation on us
Get structured feedback, scored against a 4-axis rubric. Premium unlocks unlimited.
0 wordstarget ~200
Up next
DTE EnergyValidating Data Before ReportingEasyAUsing SQL to Solve Churn RiskEasyWendy'sApproaching SQL Dataset AnalysisEasy
Next question