Generalizability of clinical prediction models in mental health




Richter, Maike; Emden, Daniel; Leenings, Ramona; Winter, Nils R.; Mikolajczyk, Rafael; Massag, Janka; Zwiky, Esther; Borgers, Tiana; Redlich, Ronny; Koutsouleris, Nikolaos; Falguera, Renata; Edwin Thanarajah, Sharmili; Padberg, Frank; Reinhard, Matthias A.; Back, Mitja D.; Morina, Nexhmedin; Buhlmann, Ulrike; Kircher, Tilo; Dannlowski, Udo; MBB consortium, FOR2107 consortium; PRONIA consortium; Hahn, Tim; Opel, Nils

PublisherSpringer Science and Business Media LLC

2025

Molecular Psychiatry

Molecular Psychiatry

1359-4184

1476-5578

DOIhttps://doi.org/10.1038/s41380-025-02950-0(external)

https://doi.org/10.1038/s41380-025-02950-0(external)

https://research.utu.fi/converis/portal/detail/Publication/491803745(external)



Concerns about the generalizability of machine learning models in mental health arise, partly due to sampling effects and data disparities between research cohorts and real-world populations. We aimed to investigate whether a machine learning model trained solely on easily accessible and low-cost clinical data can predict depressive symptom severity in unseen, independent datasets from various research and real-world clinical contexts. This observational multi-cohort study included 3021 participants (62.03% females, MAge = 36.27 years, range 15–81) from ten European research and clinical settings, all diagnosed with an affective disorder. We firstly compared research and real-world inpatients from the same treatment center using 76 clinical and sociodemographic variables. An elastic net algorithm with ten-fold cross-validation was then applied to develop a sparse machine learning model for predicting depression severity based on the top five features (global functioning, extraversion, neuroticism, emotional abuse in childhood, and somatization). Model generalizability was tested across nine external samples. The model reliably predicted depression severity across all samples (r = 0.60, SD = 0.089, p < 0.0001) and in each individual external sample, ranging in performance from r = 0.48 in a real-world general population sample to r = 0.73 in real-world inpatients. These results suggest that machine learning models trained on sparse clinical data have the potential to predict illness severity across diverse settings, offering insights that could inform the development of more generalizable tools for use in routine psychiatric data analysis.


The study was supported by the following grants: Interdisciplinary Center for Clinical Research (IZKF) of the medical faculty of Münster grant SEED 11/18 (NO), Dan3/022/22 (UD). German Research Foundation grants RE4458/1-1 (RR), KI 588/14-1 (TK), KI 588/14-2 (TK), KI 588/15-1 (TK), KI 588/17-1 (TK), DA 1151/5-1 (UD), DA 1151/ 5-2 (UD), DA 1151/6-1 (UD), DA1151/9-1 (UD), DA1151/10-1 (UD), DA1151/11-1 (UD), KR 3822/5-1 (AK), KR 3822/7-2 (AK), NE 2254/1-2 (IN), NE 2254/2-1 (IN), NE2254/3-1 (IN), NE2254/4-1 (IN), HA 7070/2-2 (TH), HA7070/3 (TH), HA7070/4 (TH), KO-121806 (KD), and JO22022/1-1. Collaborative Project funded by the European Union (EU) under the 7th Framework Programme grant 601252. German Federal Ministry of Education and Research grants 01EE2305C (RR), 01EE230A (NO), 01EE2303A, 01ER1301A/B/C, 01ER1511D, 01ER1801A/B/C/D, the Federal States of Germany and the Helmholtz Association, the participating universities and the institutes of the Leibniz Association. FöFoLePLUS program of the Faculty of Medicine of the Ludwig-Maximilians-University, Munich, Germany, grant #003, MCSP (MAR). Open Access funding enabled and organized by Projekt DEAL.


Last updated on 2025-27-05 at 15:05