Quicksort leave-pair-out cross-validation for ROC curve analysis




Numminen Riikka, Montoya Perez Ileana, Jambor Ivan, Pahikkala Tapio, Airola Antti

PublisherSPRINGER HEIDELBERG

2022

Computational Statistics

COMPUTATIONAL STATISTICS

COMPUTATION STAT

17

0943-4062

1613-9658

DOIhttps://doi.org/10.1007/s00180-022-01288-3

https://doi.org/10.1007/s00180-022-01288-3

https://research.utu.fi/converis/portal/detail/Publication/176696554



Receiver Operating Characteristic (ROC) curve analysis and area under the ROC curve (AUC) are commonly used performance measures in diagnostic systems. In this work, we assume a setting, where a classifier is inferred from multivariate data to predict the diagnostic outcome for new cases. Cross-validation is a resampling method for estimating the prediction performance of a classifier on data not used for inferring it. Tournament leave-pair-out (TLPO) cross-validation has been shown to be better than other resampling methods at producing a ranking of data that can be used for estimating the ROC curves and areas under them. However, the time complexity of TLPOCV, O(n(2)), means that it is impractical in many applications. In this article, a method called quicksort leave-pair-out cross-validation (QLPOCV) is presented in order to decrease the time complexity of obtaining a reliable ranking of data to O(n log n). The proposed method is compared with existing ones in an experimental study, demonstrating that in terms of ROC curves and AUC values QLPOCV produces as accurate performance estimation as TLPOCV, outperforming both k-fold and leave-one-out cross-validation.

Last updated on 2024-26-11 at 23:11