Enhancing disease clustering through symptom-based analysis and large language model interpretations




Onojete, Efe; Ibeke, Ebuka; Ezenkwu, Chinedu Pascal; Iwendi, Celestine; Ben Dhaou, Imed

PublisherSpringer Nature

2025

 Scientific Reports

36651

15

2045-2322

DOIhttps://doi.org/10.1038/s41598-025-20382-2

https://doi.org/10.1038/s41598-025-20382-2

https://research.utu.fi/converis/portal/detail/Publication/505458384



Humans face various diseases that are mainly caused by environmental conditions and living habits. These diseases exhibit several symptoms and can share a relationship based on their symptoms. The identification and interpretation of these groups of symptom-based diseases can aid in developing treatment plans for a new outbreak of disease. This research explores the intersection of machine learning and healthcare, specifically focusing on the enhancement of disease classification through symptom-based cluster analysis. By leveraging unsupervised machine learning algorithms, patterns and relationships within diverse symptom datasets were identified, revealing novel associations and subtypes in disease manifestation. The integration of a Large Language Model (LLM), specifically OpenAI’s Generative Pretrained Transformer(GPT), played a pivotal role in interpreting and communicating the complex outputs of the machine learning process. The results indicated a significant improvement in defining distinct clusters based on the relationship between diseases and symptoms, with GPT-4o providing simplified explanations that bridge the gap between machine-generated insights and healthcare professional’s understanding. The study’s findings offer a more profound understanding of the distinctive features characterising the different clusters of diseases generated by the machine learning models.


There was no funding to complete this research.


Last updated on 02/12/2025 07:57:34 AM