A1 Refereed original research article in a scientific journal

Enhancing disease clustering through symptom-based analysis and large language model interpretations




AuthorsOnojete, Efe; Ibeke, Ebuka; Ezenkwu, Chinedu Pascal; Iwendi, Celestine; Ben Dhaou, Imed

PublisherSpringer Nature

Publication year2025

Journal: Scientific Reports

Article number36651

Volume15

eISSN2045-2322

DOIhttps://doi.org/10.1038/s41598-025-20382-2

Publication's open availability at the time of reportingOpen Access

Publication channel's open availability Open Access publication channel

Web address https://doi.org/10.1038/s41598-025-20382-2

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/505458384


Abstract

Humans face various diseases that are mainly caused by environmental conditions and living habits. These diseases exhibit several symptoms and can share a relationship based on their symptoms. The identification and interpretation of these groups of symptom-based diseases can aid in developing treatment plans for a new outbreak of disease. This research explores the intersection of machine learning and healthcare, specifically focusing on the enhancement of disease classification through symptom-based cluster analysis. By leveraging unsupervised machine learning algorithms, patterns and relationships within diverse symptom datasets were identified, revealing novel associations and subtypes in disease manifestation. The integration of a Large Language Model (LLM), specifically OpenAI’s Generative Pretrained Transformer(GPT), played a pivotal role in interpreting and communicating the complex outputs of the machine learning process. The results indicated a significant improvement in defining distinct clusters based on the relationship between diseases and symptoms, with GPT-4o providing simplified explanations that bridge the gap between machine-generated insights and healthcare professional’s understanding. The study’s findings offer a more profound understanding of the distinctive features characterising the different clusters of diseases generated by the machine learning models.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
There was no funding to complete this research.


Last updated on 2025-02-12 at 07:57