Covariance matrix estimation for left-censored data

: Maiju Pesonen, Henri Pesonen, Jaakko Nevalainen

Publisher: ELSEVIER SCIENCE BV

: 2015

Computational Statistics and Data Analysis

COMPUTATIONAL STATISTICS & DATA ANALYSIS

: COMPUT STAT DATA AN

: 92

: 13

: 25

: 13

: 0167-9473

DOI: https://doi.org/10.1016/j.csda.2015.06.005

Multivariate methods often rely on a sample covariance matrix. The conventional estimators of a covariance matrix require complete data vectors on all subjects an assumption that can frequently not be met. For example, in many fields of life sciences that are utilizing modern measuring technology, such as mass spectrometry, left-censored values caused by denoising the data are a commonplace phenomena. Left-censored values are low-level concentrations that are considered too imprecise to be reported as a single number but known to exist somewhere between zero and the laboratory's lower limit of detection. Maximum likelihood-based covariance matrix estimators that allow the presence of the left-censored values without substituting them with a constant or ignoring them completely are considered. The presented estimators efficiently use all the information available and thus, based on simulation studies, produce the least biased estimates compared to often used competing estimators. As the genuine maximum likelihood estimate can be solved fast only in low dimensions, it is suggested to estimate the covariance matrix element-wise and then adjust the resulting covariance matrix to achieve positive semi-definiteness. It is shown that the new approach succeeds in decreasing the computation times substantially and still produces accurate estimates. Finally, as an example, a left-censored data set of toxic chemicals is explored.