
In large NVIDIA C++ codebases such as CUDA runtime components or TensorRT execution paths, concurrency changes are risky because correctness depends on ordering, shared state, and synchronization semantics.
Explain how you would structure a code review for a complex C++ concurrency change. Your answer should cover:
The interviewer expects a coding-focused, systems-oriented explanation rather than people-process advice. Discuss concrete techniques such as dependency graphs, lock-order analysis, invariants, state transitions, and use of NVIDIA-relevant tooling where appropriate. You do not need to implement a full static analyzer, but you should explain the algorithmic structure of a strong review approach and the failure modes it is designed to catch.
Model threads, locks, atomics, queues, and shared resources as nodes in a graph. Edges represent acquisition order, happens-before relationships, or data dependencies, which helps reviewers reason systematically about deadlocks and races instead of reading linearly.
graph = {
'thread_A': ['mutex_M'],
'mutex_M': ['shared_buffer'],
'thread_B': ['mutex_N'],
'mutex_N': ['shared_buffer']
}
A practical review pattern is to extract all possible lock acquisition pairs and check whether they induce a cycle. If one path acquires A then B and another acquires B then A, the review should treat that as a deadlock candidate unless ordering is provably constrained.
def has_cycle(graph):
visiting, visited = set(), set()
def dfs(node):
if node in visiting:
return True
if node in visited:
return False
visiting.add(node)
for nei in graph.get(node, []):
if dfs(nei):
return True
visiting.remove(node)
visited.add(node)
return False
return any(dfs(n) for n in graph)
Every concurrency review should identify state that must remain true before and after each critical section. Examples include queue size bounds, ownership uniqueness, reference-count consistency, and whether a TensorRT work item can be observed before initialization completes.
assert 0 <= queue_size <= capacity
assert not (is_published and not is_initialized)
Correct locking is not enough if publication and observation rely on atomics or lock-free paths. Reviewers must verify that the code establishes a valid happens-before relationship through mutexes, condition variables, or matching atomic release/acquire semantics.
ready.store(True, memory_order_release)
if ready.load(memory_order_acquire):
use(data)
Complex concurrency changes need stress tests that amplify scheduling interleavings, not just happy-path unit tests. Reviewers should ask for deterministic repro cases, sanitizer runs, contention-heavy tests, and performance checks to ensure the fix does not introduce starvation or regressions.
for _ in range(100000):
run_parallel_workers(seed=_)
verify_invariants()