Define Success for Gemini Testing

Project Background

Google is preparing a quality-focused testing initiative for a new Gemini feature in Google Workspace: AI-generated email summaries in Gmail on web and Android. You are the QA Engineer responsible for defining what "success" means for the test initiative before launch readiness is reviewed. The cross-functional team includes 6 engineers, 2 QA engineers, 1 product manager, 1 UX researcher, and 1 site reliability engineer, and leadership wants a recommendation in 4 weeks because the feature is targeted for a limited rollout next quarter.

Key Stakeholders

The Gmail PM wants broad coverage and fast launch confidence. Engineering wants to minimize test maintenance and avoid delaying code freeze. The SRE lead is focused on production stability and rollback readiness. Legal and Responsible AI reviewers want evidence that harmful or misleading summaries are rare before any external rollout.

Constraints

The team has a testing budget of $90,000 for vendor-based manual evaluation and test data setup. Only 2 QA engineers are available, and one is shared 50% with another Workspace release. The feature must support English only at launch, cover 3 major user flows, and integrate with existing GoogleTest-based backend tests and Android Espresso UI tests. A launch recommendation is due in 28 days, with no headcount increase.

Complications

The summarization model is still changing weekly, so expected outputs are not fully stable.
The PM is pushing to include iOS in scope, but iOS automation is not yet ready.
Early dogfood feedback shows summaries are usually useful, but 4% contain factual omissions that may erode trust.

Your Task

Define the success criteria for this testing initiative, including quality, coverage, and launch-readiness thresholds.
Propose how you would prioritize test scope across functional, regression, UX, and Responsible AI risks.
Create a 4-week execution plan with milestones, owners, and decision points.
Identify the key trade-offs if the team must choose between broader surface coverage and deeper quality validation.
Recommend a go/no-go framework, including rollback triggers for limited rollout.

Project Background

Key Stakeholders

Constraints

Complications

The summarization model is still changing weekly, so expected outputs are not fully stable.
The PM is pushing to include iOS in scope, but iOS automation is not yet ready.
Early dogfood feedback shows summaries are usually useful, but 4% contain factual omissions that may erode trust.

Your Task

Define the success criteria for this testing initiative, including quality, coverage, and launch-readiness thresholds.
Propose how you would prioritize test scope across functional, regression, UX, and Responsible AI risks.
Create a 4-week execution plan with milestones, owners, and decision points.
Identify the key trade-offs if the team must choose between broader surface coverage and deeper quality validation.
Recommend a go/no-go framework, including rollback triggers for limited rollout.

Project Background

Key Stakeholders

Constraints

Complications

The summarization model is still changing weekly, so expected outputs are not fully stable.
The PM is pushing to include iOS in scope, but iOS automation is not yet ready.
Early dogfood feedback shows summaries are usually useful, but 4% contain factual omissions that may erode trust.

Your Task

Define the success criteria for this testing initiative, including quality, coverage, and launch-readiness thresholds.
Propose how you would prioritize test scope across functional, regression, UX, and Responsible AI risks.
Create a 4-week execution plan with milestones, owners, and decision points.
Identify the key trade-offs if the team must choose between broader surface coverage and deeper quality validation.
Recommend a go/no-go framework, including rollback triggers for limited rollout.

Project Background

Key Stakeholders

Constraints

Complications

The summarization model is still changing weekly, so expected outputs are not fully stable.
The PM is pushing to include iOS in scope, but iOS automation is not yet ready.
Early dogfood feedback shows summaries are usually useful, but 4% contain factual omissions that may erode trust.

Your Task

Define the success criteria for this testing initiative, including quality, coverage, and launch-readiness thresholds.
Propose how you would prioritize test scope across functional, regression, UX, and Responsible AI risks.
Create a 4-week execution plan with milestones, owners, and decision points.
Identify the key trade-offs if the team must choose between broader surface coverage and deeper quality validation.
Recommend a go/no-go framework, including rollback triggers for limited rollout.

Interview Guides

Project Background

Key Stakeholders

Constraints

Complications

Your Task

Define Success for Gemini Testing

Project Background

Key Stakeholders

Constraints

Complications

Your Task

Define Success for Gemini Testing

Project Background

Key Stakeholders

Constraints

Complications

Your Task

Define Success for Gemini Testing

Project Background

Key Stakeholders

Constraints

Complications

Your Task