
You are interviewing for a role where you need to reason about LLM behavior, not just API usage. A teammate says that a 200K context model should always beat an 8K model because it can see more text.
Explain how context windows work, and why a 200K context model is not always better than an 8K model.