Biomedical Text Normalization through Generative Modeling

Published in MedRxiv, 2024

Around 80% of electronic health record (EHR) data consists of unstructured medical language text. By its nature, this text is flexible and inconsistent, making it challenging to use for clinical trial matching, decision support, and predictive modeling. In this study, we develop and assess text normalization pipelines built using large-language models.

Recommended citation: Biomedical Text Normalization through Generative Modeling Jacob S. Berkowitz, Yasaman Fatapour, Apoorva Srinivasan, Jose Miguel Acitores Cortina, Nicholas P Tatonetti medRxiv 2024.09.30.24314663; doi: https://doi.org/10.1101/2024.09.30.24314663
Read paper | Download paper