You are building an NLP system that reads unstructured financial documents such as invoices, expense receipts, contracts, and audit support files, then converts them into structured fields used downstream for review and automation. The documents vary in layout, writing style, and quality, and often contain tables, line items, vendor details, dates, currencies, tax amounts, and policy-related language. Some fields are explicit, while others must be inferred from surrounding context or document structure.
How would you design an NLP system for extracting structured information from unstructured financial documents?