Tournament leave-pair-out cross-validation for receiver operating characteristic analysis
: Montoya Perez I., Airola A., Boström P., Jambor I., Pahikkala T.
Publisher: SAGE Publications Ltd
: 2019
: Statistical Methods in Medical Research
: Statistical Methods in Medical Research
: 28
: 10-11
: 2975
: 2991
: 17
: 0962-2802
: 1477-0334
DOI: https://doi.org/10.1177/0962280218795190
: http://journals.sagepub.com/doi/pdf/10.1177/0962280218795190
: https://research.utu.fi/converis/portal/detail/Publication/36131645
Receiver operating characteristic analysis is widely used for evaluating diagnostic systems. Recent studies have shown that estimating an area under receiver operating characteristic curve with standard cross-validation methods suffers from a large bias. The leave-pair-out cross-validation has been shown to correct this bias. However, while leave-pair-out produces an almost unbiased estimate of area under receiver operating characteristic curve, it does not provide a ranking of the data needed for plotting and analyzing the receiver operating characteristic curve. In this study, we propose a new method called tournament leave-pair-out cross-validation. This method extends leave-pair-out by creating a tournament from pair comparisons to produce a ranking for the data. Tournament leave-pair-out preserves the advantage of leave-pair-out for estimating area under receiver operating characteristic curve, while it also allows performing receiver operating characteristic analyses. We have shown using both synthetic and real-world data that tournament leave-pair-out is as reliable as leave-pair-out for area under receiver operating characteristic curve estimation and confirmed the bias in leave-one-out cross-validation on low-dimensional data. As a case study on receiver operating characteristic analysis, we also evaluate how reliably sensitivity and specificity can be estimated from tournament leave-pair-out receiver operating characteristic curves.