
GenAI systems behave differently from deterministic software: the same input can produce multiple acceptable outputs, and small prompt changes can shift behavior. This makes traditional exact-match testing insufficient.
Explain the main behaviors of GenAI systems that affect testing, and how you would adapt a coding-oriented test strategy to handle them.
Address these sub-questions:
The interviewer expects a practical engineering explanation, not a product discussion. Focus on test design, validation logic, reproducibility, robustness checks, and how algorithmic techniques such as normalization, rule-based validation, and similarity scoring can be used in code-driven evaluation pipelines.
GenAI outputs are often stochastic, so the same prompt may produce different valid responses across runs. Testing must therefore verify properties or constraints of the output rather than rely only on exact string equality.
valid = output.startswith('{') and 'answer' in output
Small wording, formatting, or context changes in prompts can cause large output changes. Good testing includes prompt perturbation tests to measure whether behavior remains within acceptable bounds.
variants = [prompt, prompt + '
Be concise.', prompt.replace('summarize', 'briefly summarize')]
A test oracle defines how correctness is judged. For GenAI, the oracle is often a combination of exact checks, schema validation, keyword coverage, semantic similarity, and safety rules.
required_keys = {'label', 'reason'}
passed = isinstance(result, dict) and required_keys.issubset(result)
When exact expected outputs are hard to define, test transformations can still verify consistency. For example, reordering irrelevant context or changing whitespace should not materially change a classification result.
assert classify(text) == classify(' ' + text + ' ')
Testing should cover more than task success. Common GenAI-specific failures include hallucinations, instruction omission, unsafe content, malformed structured output, and unstable behavior across repeated runs.
if not json_valid(output):
failures.append('schema_error')