A4 Refereed article in a conference publication
Clustering Nursing Sentences - Comparing Three Sentence Embedding Methods
Authors: Moen Hans, Suhonen Henry, Salanterä Sanna, Salakoski Tapio, Peltonen Laura-Maria
Editors: Brigitte Séroussi, Patrick Weber, Ferdinand Dhombres, Cyril Grouin, Jan-David Liebe, Sylvia Pelayo, Andrea Pinna, Bastien Rance, Lucia Sacchi, Adrien Ugon, Arriel Benis, Parisis Gallos
Conference name: Medical Informatics Europe
Publication year: 2022
Journal: Medical informatics Europe
Book title : Challenges of Trustable AI and Added-Value on Health
Journal name in source: Studies in health technology and informatics
Journal acronym: Stud Health Technol Inform
Series title: Studies in Health Technology and Informatics
Volume: 294
First page : 854
Last page: 858
ISBN: 978-1-64368-284-6
eISBN: 978-1-64368-285-3
ISSN: 0926-9630
eISSN: 1879-8365
DOI: https://doi.org/10.3233/SHTI220606
Web address : https://ebooks.iospress.nl/doi/10.3233/SHTI220606
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/178641781
In health sciences, high-quality text embeddings may augment qualitative data analysis of large amounts of text by enabling, e.g., searching and clustering of health information. This study aimed to evaluate three different sentence-level embedding methods in clustering sentences in nursing narratives from individual patients' hospital care episodes. Two of these embeddings are generated from language models based on the BERT framework, and the third on the Sent2Vec method. These embedding methods were used to cluster sentences from 20 patient care episodes and the results were manually evaluated. Findings suggest that the best clusters were produced by the embeddings from a BERT model fine-tuned for the proxy task of predicting subject headings for nursing text.
Downloadable publication This is an electronic reprint of the original article. |