You are improving a finance-domain assistant that helps operations teams draft responses, summarize account notes, and answer questions about policies and workflows. The current version uses retrieval over internal SOPs, underwriting guidelines, and historical support resolutions, but users still report inconsistent tone, weak handling of repetitive classification-style tasks, and occasional unsupported answers. Traffic is expected to reach 20K requests per day, with a mix of knowledge-grounded questions and high-volume templated tasks. You need to decide whether to keep investing in retrieval, fine-tune a model, or use a hybrid approach.
How would you decide when fine-tuning is the right choice instead of retrieval-augmented generation for this assistant, and what system would you ship first given these constraints?