Tournament leave-pair-out cross-validation for receiver operating characteristic analysis




Montoya Perez I., Airola A., Boström P., Jambor I., Pahikkala T.

PublisherSAGE Publications Ltd

2019

Statistical Methods in Medical Research

Statistical Methods in Medical Research

28

10-11

2975

2991

17

0962-2802

1477-0334

DOIhttps://doi.org/10.1177/0962280218795190

http://journals.sagepub.com/doi/pdf/10.1177/0962280218795190

https://research.utu.fi/converis/portal/detail/Publication/36131645



Receiver operating characteristic analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under receiver operating characteristic curve with standard cross-validation methods suffers from a large bias. The leave-pair-out cross-validation has been shown to correct this bias. However, while leave-pair-out produces an almost unbiased estimate of area under receiver operating characteristic curve, it does not provide a ranking of the data needed for plotting and analyzing the receiver operating characteristic curve. In this study, we propose a new method called tournament leave-pair-out cross-validation. This method extends leave-pair-out by creating a tournament from pair comparisons to produce a ranking for the data. Tournament leave-pair-out preserves the advantage of leave-pair-out for estimating area under receiver operating characteristic curve, while it also allows performing receiver operating characteristic analyses. We have shown using both synthetic and real-world data that tournament leave-pair-out is as reliable as leave-pair-out for area under receiver operating characteristic curve estimation and confirmed the bias in leave-one-out cross-validation on low-dimensional data. As a case study on receiver operating characteristic analysis, we also evaluate how reliably sensitivity and specificity can be estimated from tournament leave-pair-out receiver operating characteristic curves.


Last updated on 2024-26-11 at 18:52