Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge

Published in Artificial Intelligence in Medicine (AIME25), 2024

Large language models (LLMs) integrate knowledge from diverse sources into a single set of internal weights. However, these representations are difficult to interpret, complicating our understanding of the models’ learning capabilities. Sparse autoencoders (SAEs) linearize LLM embeddings, creating monosemantic features that both provide insight into the model’s comprehension and simplify downstream machine learning tasks. These features are especially important in biomedical applications where explainability is critical. Here, we evaluate the use of Gemma Scope SAEs to identify how LLMs store known facts involving adverse drug reactions (ADRs).

Recommended citation: Berkowitz, J. et al. (2025). Probing Large Language Model Hidden States for Adverse Drug Reaction Knowledge. In: Bellazzi, R., Juarez Herrero, J.M., Sacchi, L., Zupan, B. (eds) Artificial Intelligence in Medicine. AIME 2025. Lecture Notes in Computer Science(), vol 15734. Springer, Cham. https://doi.org/10.1007/978-3-031-95838-0_6
Read paper

Share on

Twitter Facebook LinkedIn

Jose Miguel Acitores Cortina

Share on