Evaluation metrics and statistical tests for machine learning - UTU Tutkimustietojärjestelmä

A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä

Evaluation metrics and statistical tests for machine learning

Tekijät: Rainio, Oona; Teuho, Jarmo; Klén, Riku

Julkaisuvuosi: 2024

Lehti: Scientific Reports

Tietokannassa oleva lehden nimi: Scientific Reports

Artikkelin numero: 6086

Vuosikerta: 14

DOI: https://doi.org/10.1038/s41598-024-56706-x

Julkaisun avoimuus kirjaamishetkellä: Avoimesti saatavilla

Julkaisukanavan avoimuus : Kokonaan avoin julkaisukanava

Verkko-osoite: https://doi.org/10.1038/s41598-024-56706-x

Rinnakkaistallenteen osoite: https://research.utu.fi/converis/portal/detail/Publication/387398800

Rinnakkaistallenteen lisenssi: CC BY

Rinnakkaistallennetun julkaisun versio: Kustantajan versio

Lisätietoja: Author correction to this article: https://www.nature.com/articles/s41598-024-66611-y ; DOI: 10.1038/s41598-024-66611-y

Tiivistelmä

Research on different machine learning (ML) has become incredibly popular during the past few decades. However, for some researchers not familiar with statistics, it might be difficult to understand how to evaluate the performance of ML models and compare them with each other. Here, we introduce the most common evaluation metrics used for the typical supervised ML tasks including binary, multi-class, and multi-label classification, regression, image segmentation, object detection, and information retrieval. We explain how to choose a suitable statistical test for comparing models, how to obtain enough values of the metric for testing, and how to perform the test and interpret its results. We also present a few practical examples about comparing convolutional neural networks used to classify X-rays with different lung infections and detect cancer tumors in positron emission tomography images.

Ladattava julkaisu

This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail. Please cite the original version.

s41598-024-56706-x.pdf