Delirium Identification from Nursing Reports Using Large Language Models
: Graf, Lisa; Ritzi, Alexander; Schöler, Lili M.
: Andrikopoulou, Elisavet; Gallos, Parisis; Arvanitis, Theodoros N.; Austin, Rosalynn; Benis, Arriel; Cornet, Ronald; Chatzistergos, Panagiotis; Dejaco, Alexander; Dusseljee-Peute, Linda; Mohasseb, Alaa; Natsiavas, Pantelis; Nakkas, Haythem; Scott, Philip
: Medical Informatics Europe Conference
Publisher: IOS Press
: 2025
Studies in Health Technology and Informatics
: Intelligent Health Systems – From Technology to Data and Knowledge: Proceedings of MIE 2025
: 327
: 886
: 887
: 978-1-64368-596-0
: 0926-9630
: 1879-8365
DOI: https://doi.org/10.3233/SHTI250492
: https://doi.org/10.3233/shti250492
: https://research.utu.fi/converis/portal/detail/Publication/499069125
This study investigates large language models for delirium detection from nursing reports, comparing keyword matching, prompting, and finetuning. Using a manually labelled dataset from the University Hospital Freiburg, Germany, we tested Llama3 and Phi3 models. Both prompting and finetuning were effective, with finetuning Phi3 (3.8B) achieving the highest accuracy (90.24%) and AUROC (96.07%), significantly outperforming other methods.