Delirium Identification from Nursing Reports Using Large Language Models




Graf, Lisa; Ritzi, Alexander; Schöler, Lili M.

Andrikopoulou, Elisavet; Gallos, Parisis; Arvanitis, Theodoros N.; Austin, Rosalynn; Benis, Arriel; Cornet, Ronald; Chatzistergos, Panagiotis; Dejaco, Alexander; Dusseljee-Peute, Linda; Mohasseb, Alaa; Natsiavas, Pantelis; Nakkas, Haythem; Scott, Philip

Medical Informatics Europe Conference

PublisherIOS Press

2025

 Studies in Health Technology and Informatics

Intelligent Health Systems – From Technology to Data and Knowledge: Proceedings of MIE 2025

327

886

887

978-1-64368-596-0

0926-9630

1879-8365

DOIhttps://doi.org/10.3233/SHTI250492

https://doi.org/10.3233/shti250492

https://research.utu.fi/converis/portal/detail/Publication/499069125



This study investigates large language models for delirium detection from nursing reports, comparing keyword matching, prompting, and finetuning. Using a manually labelled dataset from the University Hospital Freiburg, Germany, we tested Llama3 and Phi3 models. Both prompting and finetuning were effective, with finetuning Phi3 (3.8B) achieving the highest accuracy (90.24%) and AUROC (96.07%), significantly outperforming other methods.


Last updated on 07/08/2025 08:16:25 AM