A1 Refereed original research article in a scientific journal
Enhancing disease clustering through symptom-based analysis and large language model interpretations
Authors: Onojete, Efe; Ibeke, Ebuka; Ezenkwu, Chinedu Pascal; Iwendi, Celestine; Ben Dhaou, Imed
Publisher: Springer Nature
Publication year: 2025
Journal: Scientific Reports
Article number: 36651
Volume: 15
eISSN: 2045-2322
DOI: https://doi.org/10.1038/s41598-025-20382-2
Publication's open availability at the time of reporting: Open Access
Publication channel's open availability : Open Access publication channel
Web address : https://doi.org/10.1038/s41598-025-20382-2
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/505458384
Humans face various diseases that are mainly caused by environmental conditions and living habits. These diseases exhibit several symptoms and can share a relationship based on their symptoms. The identification and interpretation of these groups of symptom-based diseases can aid in developing treatment plans for a new outbreak of disease. This research explores the intersection of machine learning and healthcare, specifically focusing on the enhancement of disease classification through symptom-based cluster analysis. By leveraging unsupervised machine learning algorithms, patterns and relationships within diverse symptom datasets were identified, revealing novel associations and subtypes in disease manifestation. The integration of a Large Language Model (LLM), specifically OpenAI’s Generative Pretrained Transformer(GPT), played a pivotal role in interpreting and communicating the complex outputs of the machine learning process. The results indicated a significant improvement in defining distinct clusters based on the relationship between diseases and symptoms, with GPT-4o providing simplified explanations that bridge the gap between machine-generated insights and healthcare professional’s understanding. The study’s findings offer a more profound understanding of the distinctive features characterising the different clusters of diseases generated by the machine learning models.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
There was no funding to complete this research.