Fine-Tune LLM for Finance Review

Business Context

AppZen wants to adapt a pre-trained language model to a finance-specific workflow in AppZen Expense Audit: classifying expense report line items into policy outcomes such as approve, needs review, or reject based on receipt text, memo text, and merchant context. The goal is to reduce manual review load while preserving audit accuracy on high-risk spend.

Data

Volume: 1.8M historical expense line items with reviewer outcomes
Inputs: OCR receipt text, employee memo, merchant name, MCC code, currency, country, and policy snippets
Text length: 20-1,200 tokens after OCR cleanup; median 180
Language: 88% English, 12% multilingual receipts and memos
Label distribution: Approve 72%, Needs Review 20%, Reject 8%
Data quality issues: OCR noise, duplicated receipts, redacted fields, inconsistent merchant formatting

Success Criteria

A strong solution should achieve macro-F1 ≥ 0.84, reject-class recall ≥ 0.90, and inference latency under 250 ms per item in batch scoring. The system should also provide stable performance across major geographies and merchant categories.

Constraints

Financial data must remain in AppZen-controlled infrastructure
The solution must support weekly retraining with new reviewer feedback
The model should fit on a single A10/T4-class GPU for fine-tuning and production inference
Outputs must be auditable for compliance and reviewer trust

Requirements

Design a fine-tuning approach for a finance-specific LLM or encoder model for this classification task.
Define a realistic preprocessing pipeline for OCR-heavy financial text.
Explain how you would handle class imbalance, multilingual inputs, and noisy labels.
Provide Python code for preprocessing, fine-tuning, and evaluation using modern NLP tooling.
Describe how you would validate the model, analyze errors, and decide whether it is ready for deployment in AppZen Expense Audit.

Business Context

Data

Volume: 1.8M historical expense line items with reviewer outcomes

Inputs: OCR receipt text, employee memo, merchant name, MCC code, currency, country, and policy snippets

Text length: 20-1,200 tokens after OCR cleanup; median 180

Language: 88% English, 12% multilingual receipts and memos

Label distribution: Approve 72%, Needs Review 20%, Reject 8%

Data quality issues: OCR noise, duplicated receipts, redacted fields, inconsistent merchant formatting

Requirements

Design a fine-tuning approach for a finance-specific LLM or encoder model for this classification task.

Define a realistic preprocessing pipeline for OCR-heavy financial text.

Explain how you would handle class imbalance, multilingual inputs, and noisy labels.

Provide Python code for preprocessing, fine-tuning, and evaluation using modern NLP tooling.

Describe how you would validate the model, analyze errors, and decide whether it is ready for deployment in AppZen Expense Audit.

Business Context

Data

Volume: 1.8M historical expense line items with reviewer outcomes

Inputs: OCR receipt text, employee memo, merchant name, MCC code, currency, country, and policy snippets

Text length: 20-1,200 tokens after OCR cleanup; median 180

Language: 88% English, 12% multilingual receipts and memos

Label distribution: Approve 72%, Needs Review 20%, Reject 8%

Data quality issues: OCR noise, duplicated receipts, redacted fields, inconsistent merchant formatting

Requirements

Design a fine-tuning approach for a finance-specific LLM or encoder model for this classification task.

Define a realistic preprocessing pipeline for OCR-heavy financial text.

Explain how you would handle class imbalance, multilingual inputs, and noisy labels.

Provide Python code for preprocessing, fine-tuning, and evaluation using modern NLP tooling.

Describe how you would validate the model, analyze errors, and decide whether it is ready for deployment in AppZen Expense Audit.

Business Context

Data

Volume: 1.8M historical expense line items with reviewer outcomes

Inputs: OCR receipt text, employee memo, merchant name, MCC code, currency, country, and policy snippets

Text length: 20-1,200 tokens after OCR cleanup; median 180

Language: 88% English, 12% multilingual receipts and memos

Label distribution: Approve 72%, Needs Review 20%, Reject 8%

Data quality issues: OCR noise, duplicated receipts, redacted fields, inconsistent merchant formatting

Requirements

Design a fine-tuning approach for a finance-specific LLM or encoder model for this classification task.

Define a realistic preprocessing pipeline for OCR-heavy financial text.

Explain how you would handle class imbalance, multilingual inputs, and noisy labels.

Provide Python code for preprocessing, fine-tuning, and evaluation using modern NLP tooling.

Describe how you would validate the model, analyze errors, and decide whether it is ready for deployment in AppZen Expense Audit.

Interview Guides

Business Context

Data

Success Criteria

Constraints

Requirements

Fine-Tune LLM for Finance Review

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer

Fine-Tune LLM for Finance Review

Business Context

Data

Success Criteria

Constraints

Requirements

Fine-Tune LLM for Finance Review

Business Context

Data

Success Criteria

Constraints

Requirements

Your Answer