Publications

You can also find my articles on my Google Scholar profile.

Enhancing EHR-based pancreatic cancer prediction with LLM-derived embeddings

Published in npj Digital Medicine, 2025

We developed a predictive model using large language model (LLM)-derived embeddings of medical condition for early pancreatic cancer detection.

Recommended citation: Park, J., Patterson, J., Acitores Cortina, J.M. et al. Enhancing EHR-based pancreatic cancer prediction with LLM-derived embeddings. npj Digit. Med. 8, 465 (2025). https://doi.org/10.1038/s41746-025-01869-8
Read paper | Download paper

Biomedical Text Normalization through Generative Modeling

Published in Journal of Biomedical Informatics, 2025

In this study, we develop and assess text normalization pipelines built using large-language models.

Recommended citation: Jacob S. Berkowitz, Apoorva Srinivasan, Jose Miguel Acitores Cortina, Yasaman Fatapour, Nicholas P Tatonetti, Biomedical text normalization through generative modeling, Journal of Biomedical Informatics, Volume 167, 2025, 104850, ISSN 1532-0464, https://doi.org/10.1016/j.jbi.2025.104850.
Read paper | Download paper

Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for’Complete Data’

Published in JMIR, 2025

In this study, we examined the race/ethnicity biases introduced by applying common filters to four clinical records databases.

Recommended citation: Acitores Cortina J, Fatapour Y, Brown K, Gisladottir U, Zietz M, Bear Don’t Walk IV O, Peter D, Berkowitz J, Friedrich N, Kivelson S, Kuchi A, Liu H, Srinivasan A, Tsang K, Tatonetti N Biases in Race and Ethnicity Introduced by Filtering Electronic Health Records for “Complete Data”: Observational Clinical Data Analysis JMIR Med Inform 2025;13:e67591 URL: https://medinform.jmir.org/2025/1/e67591 DOI: 10.2196/67591
Read paper | Download paper

Foundation Models for Translational Cancer Biology

Published in Annual Reviews, 2025

We examine the role of foundation models in domains relevant to cancer research, including natural language processing, computer vision, molecular biology, and cheminformatics.

Recommended citation: Tsang, Kevin K., Kivelson, Sophia, Acitores Cortina, Jose M., Kuchi, Aditi, Berkowitz, Jacob S., Liu, Hongyu, Srinivasan, Apoorva, Friedrich, Nadine A., Fatapour, Yasaman, Tatonetti, Nicholas P. Foundation Models for Translational Cancer Biology, Annual Review of Biomedical Data Science, Volume 8, 2025, https://doi.org/10.1146/annurev-biodatasci-103123-095633
Read paper | Download paper

Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge

Published in Artificial Intelligence in Medicine (AIME25), 2024

We evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs).

Recommended citation: Berkowitz, J. et al. (2025). Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge. In: Bellazzi, R., Juarez Herrero, J.M., Sacchi, L., Zupan, B. (eds) Artificial Intelligence in Medicine. AIME 2025. Lecture Notes in Computer Science(), vol 15734. Springer, Cham. https://doi.org/10.1007/978-3-031-95838-0_6
Read paper

Generalizable and Automated Classification of TNM Stage from Pathology Reports with External Validation

Published in Nature communications, 2024

We present a generalizable method for the automated classification of TNM stage from pathology report text.

Recommended citation: Kefeli, J., Berkowitz, J., Acitores Cortina, J.M. et al. Generalizable and automated classification of TNM stage from pathology reports with external validation. Nat Commun 15, 8916 (2024). https://doi.org/10.1038/s41467-024-53190-9
Read paper | Download paper

TLab at #SMM4H 2024: Retrieval-Augmented Generation for ADE Extraction and Normalization

Published in Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024), 2024

SMM4H 2024 Task 1 is focused on the identification and standardization of Adverse Drug Events (ADEs) in tweets.

Recommended citation: Jacob Berkowitz, Apoorva Srinivasan, Jose Cortina, and Nicholas Tatonetti1. 2024. TLab at #SMM4H 2024: Retrieval-Augmented Generation for ADE Extraction and Normalization. In Proceedings of the 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, pages 153–157, Bangkok, Thailand. Association for Computational Linguistics.
Read paper | Download paper