A1 Refereed original research article in a scientific journal

Predicting early-stage coronary artery disease using machine learning and routine clinical biomarkers improved by augmented virtual data




AuthorsKoloi, Angela; Loukas, Vasileios S.; Hourican, Cillian; Sakellarios, Antonis, I; Quax, Rick; Mishra, Pashupati P.; Lehtimäki, Terho; Raitakari, Olli T.; Papaloukas, Costas; Bosch, Jos A.; Maerz, Winfried; Fotiadis, Dimitrios, I

PublisherOXFORD UNIV PRESS

Publishing placeOXFORD

Publication year2024

JournalEuropean Heart Journal - Digital Health

Journal name in sourceEUROPEAN HEART JOURNAL - DIGITAL HEALTH

Journal acronymEUR HEART J-DIGIT HL

Volume5

Issue5

First page 542

Last page550

Number of pages9

eISSN2634-3916

DOIhttps://doi.org/10.1093/ehjdh/ztae049

Web address https://doi.org/10.1093/ehjdh/ztae049

Self-archived copy’s web addresshttps://research.utu.fi/converis/portal/detail/Publication/457759737


Abstract

Aims: Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests.

Methods and results: The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results' generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to -0.79 for random forests (RFs), and from 0.76 to -0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81-0.89), while GB exhibited a 4.8% increase (0.83-0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study.

Conclusion: Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease.


Downloadable publication

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.




Funding information in the publication
This project has received funding from the European Union’s Horizon 2020 research and innovation programme TO_AITION under grant agreement No 848146.


Last updated on 2025-27-01 at 19:53