
Teams often inherit legacy code that is hard to understand, tightly coupled, and missing automated tests. The challenge is improving structure without accidentally changing behavior in production.
Explain how you would safely refactor a legacy piece of code that has no tests.
Your answer should cover:
The interviewer expects a practical engineering approach, not just a definition of refactoring. Discuss the order of operations, how to handle unknown behavior, how to validate changes, and how you would decide whether to refactor in place, wrap the code, or extract smaller units first.
When no tests exist, the first goal is often to capture current behavior rather than ideal behavior. These tests document what the system does today, including edge cases and even undesirable outputs, so later refactors can preserve behavior intentionally.
result = legacy_function("input")
assert result == expected_current_output
Refactoring changes internal structure without changing external behavior, while rewriting replaces behavior with a new implementation. In legacy systems, a rewrite is usually riskier because hidden requirements often live only in the old code.
A seam is a place where behavior can be changed or observed without editing the entire system. Creating seams around file I/O, network calls, time, randomness, or global state makes the code testable and safer to modify.
def process_order(data, send_email_fn):
# inject dependency instead of calling global function directly
send_email_fn(data)
Large refactors are hard to validate and review. Small, reversible steps let you compare outputs, isolate regressions quickly, and keep the system deployable throughout the process.
old_result = old_path(x)
new_result = new_path(x)
assert old_result == new_result
If tests are weak or incomplete, logs, metrics, tracing, and output snapshots help confirm whether behavior changed after a refactor. Observability is especially useful when the code interacts with external systems or production-only data patterns.
logger.info("legacy_output", extra={"value": output, "input": payload})