You are building a classifier for medical documents in a regulated environment. The documents include discharge summaries, pathology notes, referral letters, and prior authorization requests, and they often contain abbreviations, section headers, and long passages of clinical text. Labels are assigned by downstream review teams, but the classes are imbalanced and some documents are only weakly labeled.
How would you design a text classification model for medical documents?
You are building a classifier for medical documents in a regulated environment. The documents include discharge summaries, pathology notes, referral letters, and prior authorization requests, and they often contain abbreviations, section headers, and long passages of clinical text. Labels are assigned by downstream review teams, but the classes are imbalanced and some documents are only weakly labeled.
How would you design a text classification model for medical documents?