Variable screening based on Gaussian Centered L-moments
: An Hyowon, Zhang Kai, Oja Hannu, Marron J.S.
Publisher: Elsevier
: 2023
: Computational Statistics and Data Analysis
: COMPUTATIONAL STATISTICS & DATA ANALYSIS
: COMPUT STAT DATA AN
: 107632
: 179
: 15
: 0167-9473
: 1872-7352
DOI: https://doi.org/10.1016/j.csda.2022.107632(external)
: https://doi.org/10.1016/j.csda.2022.107632(external)
An important challenge in big data is identification of important variables. For this purpose, methods of discovering variables with non-standard univariate marginal distributions are proposed. The conventional moments based summary statistics can be well-adopted, but their sensitivity to outliers can lead to selection based on a few outliers rather than distributional shape such as bimodality. To address this type of non-robustness, the L -moments are considered. Using these in practice, however, has a limitation since they do not take zero values at the Gaussian distributions to which the shape of a marginal distribution is most naturally compared. As a remedy, Gaussian Centered L-moments are proposed, which share advantages of the L-moments, but have zeros at the Gaussian distributions. The strength of Gaussian Centered L-moments over other conventional moments is shown in theoretical and practical aspects such as their performances in screening important genes in cancer genetics data.(c) 2022 Elsevier B.V. All rights reserved.