OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

OSCAR: Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer

Tekijät: Halkola Anni S., Joki Kaisa, Mirtti Tuomas, Mäkelä Marko M., Aittokallio Tero, Laajala Teemu D.

Kustantaja: PUBLIC LIBRARY SCIENCE

Julkaisuvuosi: 2023

Journal: PLoS Computational Biology

Tietokannassa oleva lehden nimi: PLOS COMPUTATIONAL BIOLOGY

Lehden akronyymi: PLOS COMPUT BIOL

Artikkelin numero: e1010333

Vuosikerta: 19

Numero: 3

Sivujen määrä: 30

ISSN: 1553-734X

eISSN: 1553-734X

DOI: https://doi.org/10.1371/journal.pcbi.1010333

Verkko-osoite: https://doi.org/10.1371/journal.pcbi.1010333

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/179320823

Tiivistelmä

In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L₀-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables
one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

journal.pcbi.1010333.pdf