You're building an LLM application that has to answer user questions using long conversations, retrieved documents, and system instructions. The model can only see a limited context window, so you need a practical way to decide what to keep, what to summarize, and what to drop.
How do you manage context windows when building LLM applications?