Covariance matrix estimation for left-censored data




Maiju Pesonen, Henri Pesonen, Jaakko Nevalainen

PublisherELSEVIER SCIENCE BV

2015

Computational Statistics and Data Analysis

COMPUTATIONAL STATISTICS & DATA ANALYSIS

COMPUT STAT DATA AN

92

13

25

13

0167-9473

DOIhttps://doi.org/10.1016/j.csda.2015.06.005



Multivariate methods often rely on a sample covariance matrix. The conventional estimators of a covariance matrix require complete data vectors on all subjects an assumption that can frequently not be met. For example, in many fields of life sciences that are utilizing modern measuring technology, such as mass spectrometry, left-censored values caused by denoising the data are a commonplace phenomena. Left-censored values are low-level concentrations that are considered too imprecise to be reported as a single number but known to exist somewhere between zero and the laboratory's lower limit of detection. Maximum likelihood-based covariance matrix estimators that allow the presence of the left-censored values without substituting them with a constant or ignoring them completely are considered. The presented estimators efficiently use all the information available and thus, based on simulation studies, produce the least biased estimates compared to often used competing estimators. As the genuine maximum likelihood estimate can be solved fast only in low dimensions, it is suggested to estimate the covariance matrix element-wise and then adjust the resulting covariance matrix to achieve positive semi-definiteness. It is shown that the new approach succeeds in decreasing the computation times substantially and still produces accurate estimates. Finally, as an example, a left-censored data set of toxic chemicals is explored.




Last updated on 2024-26-11 at 10:48