You are designing an enterprise knowledge assistant that answers employee questions over internal documents such as runbooks, architecture reviews, support tickets, PDFs, and policy pages. The corpus contains roughly 8 million documents with frequent updates, mixed formatting, tables, duplicated content, and documents ranging from short notes to 200-page manuals. Users ask free-form questions, expect grounded answers with citations, and often use product names, acronyms, and partial error messages rather than exact keywords. You have limited labeled query-answer data, but you do have historical search logs, click data, and a small set of expert-validated question-answer pairs.
How would you design the NLP architecture for the language model and vector database components so the system retrieves the right context, generates reliable answers, and remains maintainable as the corpus and query patterns evolve? Explain the preprocessing, embedding and indexing strategy, model choices, training or fine-tuning approach, and how you would evaluate the end-to-end quality.