Biomedical Text Normalization through Generative Modeling
Published in Journal of Biomedical Informatics, 2025
Around 80% of electronic health record (EHR) data consists of unstructured medical language text. By its nature, this text is flexible and inconsistent, making it challenging to use for clinical trial matching, decision support, and predictive modeling. In this study, we develop and assess text normalization pipelines built using large-language models.
Recommended citation: Jacob S. Berkowitz, Apoorva Srinivasan, Jose Miguel Acitores Cortina, Yasaman Fatapour, Nicholas P Tatonetti, Biomedical text normalization through generative modeling, Journal of Biomedical Informatics, Volume 167, 2025, 104850, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2025.104850.
Read paper | Download paper
