A1 Refereed original research article in a scientific journal
Predicting early-stage coronary artery disease using machine learning and routine clinical biomarkers improved by augmented virtual data
Authors: Koloi, Angela; Loukas, Vasileios S.; Hourican, Cillian; Sakellarios, Antonis, I; Quax, Rick; Mishra, Pashupati P.; Lehtimäki, Terho; Raitakari, Olli T.; Papaloukas, Costas; Bosch, Jos A.; Maerz, Winfried; Fotiadis, Dimitrios, I
Publisher: OXFORD UNIV PRESS
Publishing place: OXFORD
Publication year: 2024
Journal: European Heart Journal - Digital Health
Journal name in source: EUROPEAN HEART JOURNAL - DIGITAL HEALTH
Journal acronym: EUR HEART J-DIGIT HL
Volume: 5
Issue: 5
First page : 542
Last page: 550
Number of pages: 9
eISSN: 2634-3916
DOI: https://doi.org/10.1093/ehjdh/ztae049
Web address : https://doi.org/10.1093/ehjdh/ztae049
Self-archived copy’s web address: https://research.utu.fi/converis/portal/detail/Publication/457759737
Aims: Coronary artery disease (CAD) is a highly prevalent disease with modifiable risk factors. In patients with suspected obstructive CAD, evaluating the pre-test probability model is crucial for diagnosis, although its accuracy remains controversial. Machine learning (ML) predictive models can help clinicians detect CAD early and improve outcomes. This study aimed to identify early-stage CAD using ML in conjunction with a panel of clinical and laboratory tests.
Methods and results: The study sample included 3316 patients enrolled in the Ludwigshafen Risk and Cardiovascular Health (LURIC) study. A comprehensive array of attributes was considered, and an ML pipeline was developed. Subsequently, we utilized five approaches to generating high-quality virtual patient data to improve the performance of the artificial intelligence models. An extension study was carried out using data from the Young Finns Study (YFS) to assess the results' generalizability. Upon applying virtual augmented data, accuracy increased by approximately 5%, from 0.75 to -0.79 for random forests (RFs), and from 0.76 to -0.80 for Gradient Boosting (GB). Sensitivity showed a significant boost for RFs, rising by about 9.4% (0.81-0.89), while GB exhibited a 4.8% increase (0.83-0.87). Specificity showed a significant boost for RFs, rising by ∼24% (from 0.55 to 0.70), while GB exhibited a 37% increase (from 0.51 to 0.74). The extension analysis aligned with the initial study.
Conclusion: Accurate predictions of angiographic CAD can be obtained using a set of routine laboratory markers, age, sex, and smoking status, holding the potential to limit the need for invasive diagnostic techniques. The extension analysis in the YFS demonstrated the potential of these findings in a younger population, and it confirmed applicability to atherosclerotic vascular disease.
Downloadable publication This is an electronic reprint of the original article. |
Funding information in the publication:
This project has received funding from the European Union’s Horizon 2020 research and innovation programme TO_AITION under grant agreement No 848146.