A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Machine learning for survival outcome in head and neck squamous cell carcinoma: a multicenter validation study




TekijätAlabi, Rasheed Omobolaji; Guntinas-Lichius, Orlando; Elmusrati, Mohammed; Almangush, Alhadi; Tiblom Ehrsson, Ylva; Laurell, Göran; Mäkitie, Antti A.

KustantajaSpringer Nature

Julkaisuvuosi2026

Lehti: Scientific Reports

Artikkelin numero254

Vuosikerta16

eISSN2045-2322

DOIhttps://doi.org/10.1038/s41598-025-29295-6

Julkaisun avoimuus kirjaamishetkelläAvoimesti saatavilla

Julkaisukanavan avoimuus Kokonaan avoin julkaisukanava

Verkko-osoitehttps://doi.org/10.1038/s41598-025-29295-6

Rinnakkaistallenteen osoitehttps://research.utu.fi/converis/portal/detail/Publication/505677559

Rinnakkaistallenteen lisenssiCC BY NC ND

Rinnakkaistallennetun julkaisun versioKustantajan versio


Tiivistelmä

Most head and neck squamous cell carcinoma (HNSCC) cases are diagnosed late, with an increased risk of recurrence and distant metastasis. In recent years, there has been a surge in the development of prognostic and predictive machine learning (ML) models for personalized treatment planning. However, only a small number of these have been externally validated. This study aimed to build a prognostic system by combining clinicopathological parameters and treatment-related factors as integrative inputs to build a machine learning (ML) model using data from the Surveillance, Epidemiology, and End Results (SEER, United States) program. We further validated the developed model using multicenter data obtained from the Thuringian Cancer Registry (Germany) and a multicenter prospective observational study obtained from the Uppsala University Hospital (Sweden) to estimate the overall survival (OS) of patients with HNSCC. Additionally, we explored the complementary prognostic potentials of these input parameters using permutation feature importance (PFI). A total of 40,164 patients with HNSCC were recruited from the SEER database and validated with 3950 cases obtained from the Thuringian Cancer Registry and 323 cases recruited from three University Hospitals in Sweden. We evaluated the prognostic significance of the input variables to predict OS in patients with HNSCC using permutation feature importance. The voting ensemble ML algorithm gave an area under receiving operating characteristics curve (AUC) of 0.76 and an accuracy of 70.0%. Independent external validation of the validation model with data from the Thuringian Cancer Registry and the Uppsala University Hospital gave AUCs of 0.68 and 0.76, with decreased performance accuracy in both cohorts. The PFI analysis of the base model showed that age at diagnosis, T stage, tumor site, marital status, and surgical treatment were the most important parameters for the predictive ability of the model for OS. External independent geographic validation is important for performance reproducibility and model generalization before recommending the model for further clinical evaluation. External independent geographic validation may not necessarily increase the performance accuracy. However, it can reveal and demonstrate the performance of the model outside the development data. A generalized ML can lead to individualized risk-based therapeutic decision-making. While independently validating the model may be possible during model development, data privacy and security-related issues may prevent including it as a prerequisite in the ML model development pipeline.


Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Julkaisussa olevat rahoitustiedot
The study was supported by the Finnish State Research Funding to the Helsinki University Hospital and the Turku University Hospital, and the Swedish Cancer Society (grant numbers 2015/363, 2018/502, 21 1419 Pj, and 24 3394 Pj). Open access funded by Helsinki University Library.


Last updated on