A4 Refereed article in a conference publication
A comparison of AUC estimators in small-sample studies
Authors: Airola A, Pahikkala T, Waegeman W, De Baets B, Salakoski T
Editors: Dzeroski Saso, Geurts Pierre, Rousu Juho
Publication year: 2010
Journal: JMLR workshop and conference proceedings
Book title : Proceedings of the third International Workshop on Machine Learning in Systems Biology
Journal name in source: PROCEEDINGS OF THE THIRD INTERNATIONAL WORKSHOP ON MACHINE LEARNING IN SYSTEMS BIOLOGY
Journal acronym: JMLR WORKSH CONF PRO
Series title: Proceedings of Machine Learning Research
Volume: 8
First page : 3
Last page: 13
Number of pages: 11
ISSN: 1938-7288
Web address : http://jmlr.csail.mit.edu/proceedings/papers/v8/airola10a/airola10a.pdf
Abstract
Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.
Reliable estimation of the classification performance of learned predictive models is difficult, when working in the small sample setting. When dealing with biological data it is often the case that separate test data cannot be afforded. Cross-validation is in this case a typical strategy for estimating the performance. Recent results, further supported by experimental evidence presented in this article, show that many standard approaches to cross-validation suffer from extensive bias or variance when the area under ROC curve (AUC) is used as performance measure. We advocate the use of leave-pair-out cross-validation (LPOCV) for performance estimation, as it avoids many of these problems. A method previously proposed by us can be used to efficiently calculate this estimate for regularized least-squares (RLS) based learners.