Evaluate Novelty in Feature Launch

Context

StreamSpace, a short-video app, is testing a new AI-generated “Daily Recap” carousel on the home feed. Product leadership wants to know whether any observed lift is durable or just a short-lived novelty effect from users interacting with something new.

Hypothesis Seed

The team believes the carousel will increase user engagement by helping viewers quickly find relevant content. However, they are concerned that early gains may be inflated because users click on new UI elements out of curiosity, then revert to baseline behavior after a few days.

Constraints

Eligible traffic: 1.2M daily active users per day
Only 60% of DAU land on the home feed and are eligible
Maximum experiment duration: 21 days
Engineering wants a decision within 3 weeks to include the feature in the next release train
A false positive is costly because the feature requires ongoing inference spend of about $180K/month
A false negative is also costly because home-feed engagement is a top company KPI

Task

Define a clear null and alternative hypothesis, including how you will distinguish a true sustained lift from a novelty-driven spike.
Choose the primary metric, 2-4 guardrail metrics, and at least one secondary metric. Specify the unit of randomization and unit of analysis, and justify both.
Calculate the required sample size for the primary metric using an explicit MDE, then translate that into expected runtime given the available traffic.
Propose a pre-registered analysis plan that addresses novelty effects, peeking, and multiple comparisons. Be explicit about whether you will analyze the full 21-day average, time-sliced effects, or both.
State a ship / don’t-ship / iterate rule that respects guardrails and explains what you would do if the feature shows a strong week-1 lift that fades by week 3.

Your answer should be concrete: use the numbers above, show the power calculation, and explain how you would avoid over-interpreting short-term excitement as long-term product value.

Constraints

Eligible traffic: 1.2M daily active users per day

Only 60% of DAU land on the home feed and are eligible

Maximum experiment duration: 21 days

Engineering wants a decision within 3 weeks to include the feature in the next release train

A false positive is costly because the feature requires ongoing inference spend of about $180K/month

A false negative is also costly because home-feed engagement is a top company KPI

Task

Define a clear null and alternative hypothesis, including how you will distinguish a true sustained lift from a novelty-driven spike.

Choose the primary metric, 2-4 guardrail metrics, and at least one secondary metric. Specify the unit of randomization and unit of analysis, and justify both.

Calculate the required sample size for the primary metric using an explicit MDE, then translate that into expected runtime given the available traffic.

Propose a pre-registered analysis plan that addresses novelty effects, peeking, and multiple comparisons. Be explicit about whether you will analyze the full 21-day average, time-sliced effects, or both.

State a ship / don’t-ship / iterate rule that respects guardrails and explains what you would do if the feature shows a strong week-1 lift that fades by week 3.

Your answer should be concrete: use the numbers above, show the power calculation, and explain how you would avoid over-interpreting short-term excitement as long-term product value.

Constraints

Eligible traffic: 1.2M daily active users per day

Only 60% of DAU land on the home feed and are eligible

Maximum experiment duration: 21 days

Engineering wants a decision within 3 weeks to include the feature in the next release train

A false positive is costly because the feature requires ongoing inference spend of about $180K/month

A false negative is also costly because home-feed engagement is a top company KPI

Task

Define a clear null and alternative hypothesis, including how you will distinguish a true sustained lift from a novelty-driven spike.

Choose the primary metric, 2-4 guardrail metrics, and at least one secondary metric. Specify the unit of randomization and unit of analysis, and justify both.

Calculate the required sample size for the primary metric using an explicit MDE, then translate that into expected runtime given the available traffic.

State a ship / don’t-ship / iterate rule that respects guardrails and explains what you would do if the feature shows a strong week-1 lift that fades by week 3.

Your answer should be concrete: use the numbers above, show the power calculation, and explain how you would avoid over-interpreting short-term excitement as long-term product value.

Constraints

Eligible traffic: 1.2M daily active users per day

Only 60% of DAU land on the home feed and are eligible

Maximum experiment duration: 21 days

Engineering wants a decision within 3 weeks to include the feature in the next release train

A false positive is costly because the feature requires ongoing inference spend of about $180K/month

A false negative is also costly because home-feed engagement is a top company KPI

Task

Define a clear null and alternative hypothesis, including how you will distinguish a true sustained lift from a novelty-driven spike.

Choose the primary metric, 2-4 guardrail metrics, and at least one secondary metric. Specify the unit of randomization and unit of analysis, and justify both.

Calculate the required sample size for the primary metric using an explicit MDE, then translate that into expected runtime given the available traffic.

State a ship / don’t-ship / iterate rule that respects guardrails and explains what you would do if the feature shows a strong week-1 lift that fades by week 3.

Your answer should be concrete: use the numbers above, show the power calculation, and explain how you would avoid over-interpreting short-term excitement as long-term product value.

Interview Guides

Context

Hypothesis Seed

Constraints

Task

Evaluate Novelty in Feature Launch

Context

Hypothesis Seed

Constraints

Task

Your Answer

Evaluate Novelty in Feature Launch

Context

Hypothesis Seed

Constraints

Task

Evaluate Novelty in Feature Launch

Context

Hypothesis Seed

Constraints

Task

Your Answer