
You are building an information extraction system for noisy financial documents, including invoices, bank statements, purchase orders, and remittance advice. The text often comes from OCR, so spacing, punctuation, and line breaks are inconsistent, and key fields may appear in tables, headers, or footers. You need to identify entities such as vendor names, invoice numbers, dates, amounts, tax values, and payment terms from these documents.
How would you approach information extraction from noisy financial documents?