Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data - UTU Research Portal

A1 Refereed original research article in a scientific journal

Exploring machine learning strategies for predicting cardiovascular disease risk factors from multi-omic data

Authors: Drouard Gabin, Mykkänen Juha, Heiskanen Jarkko, Pohjonen Joona, Ruohonen Saku, Pahkala Katja, Lehtimäki Terho, Wang Xiaoling, Ollikainen Miina, Ripatti Samuli, Pirinen Matti, Raitakari Olli, Kaprio Jaakko

Publisher: BioMed Central

Publication year: 2024

Journal: BMC Medical Informatics and Decision Making

Journal name in source: BMC medical informatics and decision making

Journal acronym: BMC Med Inform Decis Mak

Article number: 116

Volume: 24

Issue: 1

ISSN: 1472-6947

eISSN: 1472-6947

DOI: https://doi.org/10.1186/s12911-024-02521-3

Publication's open availability at the time of reporting: Open Access

Publication channel's open availability : Open Access publication channel

Web address : https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-024-02521-3

Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/393445661

Abstract

Background: Machine learning (ML) classifiers are increasingly used for predicting cardiovascular disease (CVD) and related risk factors using omics data, although these outcomes often exhibit categorical nature and class imbalances. However, little is known about which ML classifier, omics data, or upstream dimension reduction strategy has the strongest influence on prediction quality in such settings. Our study aimed to illustrate and compare different machine learning strategies to predict CVD risk factors under different scenarios.

Methods: We compared the use of six ML classifiers in predicting CVD risk factors using blood-derived metabolomics, epigenetics and transcriptomics data. Upstream omic dimension reduction was performed using either unsupervised or semi-supervised autoencoders, whose downstream ML classifier performance we compared. CVD risk factors included systolic and diastolic blood pressure measurements and ultrasound-based biomarkers of left ventricular diastolic dysfunction (LVDD; E/e' ratio, E/A ratio, LAVI) collected from 1,249 Finnish participants, of which 80% were used for model fitting. We predicted individuals with low, high or average levels of CVD risk factors, the latter class being the most common. We constructed multi-omic predictions using a meta-learner that weighted single-omic predictions. Model performance comparisons were based on the F1 score. Finally, we investigated whether learned omic representations from pre-trained semi-supervised autoencoders could improve outcome prediction in an external cohort using transfer learning.

Results: Depending on the ML classifier or omic used, the quality of single-omic predictions varied. Multi-omics predictions outperformed single-omics predictions in most cases, particularly in the prediction of individuals with high or low CVD risk factor levels. Semi-supervised autoencoders improved downstream predictions compared to the use of unsupervised autoencoders. In addition, median gains in Area Under the Curve by transfer learning compared to modelling from scratch ranged from 0.09 to 0.14 and 0.07 to 0.11 units for transcriptomic and metabolomic data, respectively.

Conclusions: By illustrating the use of different machine learning strategies in different scenarios, our study provides a platform for researchers to evaluate how the choice of omics, ML classifiers, and dimension reduction can influence the quality of CVD risk factor predictions.

Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

s12911-024-02521-3.pdf

Funding information in the publication:
Open Access funding provided by University of Helsinki (including Helsinki University Central Hospital). The Young Finns Study has been financially supported by the Academy of Finland: grants 322098, 286284, 134309 (Eye), 126925, 121584, 124282, 255381, 256474, 283115, 319060, 320297, 314389, 338395, 330809, 104821, 129378 (Salve), 117797 (Gendi), and 141071 (Skidi); the Social Insurance Institution of Finland; Competitive State Research Financing of the Expert Responsibility area of Kuopio, Tampere and Turku University Hospitals (grant X51001); Juho Vainio Foundation; Paavo Nurmi Foundation; Finnish Foundation for Cardiovascular Research; Finnish Cultural Foundation; The Sigrid Juselius Foundation; Tampere Tuberculosis Foundation; Emil Aaltonen Foundation; Yrjö Jahnsson Foundation; Signe and Ane Gyllenberg Foundation; Diabetes Research Foundation of Finnish Diabetes Association; EU Horizon 2020 (grant 755320 for TAXINOMISIS and grant 848146 for To Aition); European Research Council (grant 742927 for MULTIEPIGEN project); Tampere University Hospital Supporting Foundation, Finnish Society of Clinical Chemistry and the Cancer Foundation Finland. The FTC has been supported by the Academy of Finland(Grants 265,240, 263,278, 308,248, 312,073, 336,832 to Jaakko Kaprio and 297,908 to Miina Ollikainen) and the Sigrid Juselius Foundation (to Miina Ollikainen). The DNA methylation study in FTC was supported by NIH/NHLBI grantHL104125.