A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
Variable screening based on Gaussian Centered L-moments
Tekijät: An Hyowon, Zhang Kai, Oja Hannu, Marron J.S.
Kustantaja: Elsevier
Julkaisuvuosi: 2023
Journal: Computational Statistics and Data Analysis
Tietokannassa oleva lehden nimi: COMPUTATIONAL STATISTICS & DATA ANALYSIS
Lehden akronyymi: COMPUT STAT DATA AN
Artikkelin numero: 107632
Vuosikerta: 179
Sivujen määrä: 15
ISSN: 0167-9473
eISSN: 1872-7352
DOI: https://doi.org/10.1016/j.csda.2022.107632
Verkko-osoite: https://doi.org/10.1016/j.csda.2022.107632
Tiivistelmä
An important challenge in big data is identification of important variables. For this purpose, methods of discovering variables with non-standard univariate marginal distributions are proposed. The conventional moments based summary statistics can be well-adopted, but their sensitivity to outliers can lead to selection based on a few outliers rather than distributional shape such as bimodality. To address this type of non-robustness, the L -moments are considered. Using these in practice, however, has a limitation since they do not take zero values at the Gaussian distributions to which the shape of a marginal distribution is most naturally compared. As a remedy, Gaussian Centered L-moments are proposed, which share advantages of the L-moments, but have zeros at the Gaussian distributions. The strength of Gaussian Centered L-moments over other conventional moments is shown in theoretical and practical aspects such as their performances in screening important genes in cancer genetics data.(c) 2022 Elsevier B.V. All rights reserved.
An important challenge in big data is identification of important variables. For this purpose, methods of discovering variables with non-standard univariate marginal distributions are proposed. The conventional moments based summary statistics can be well-adopted, but their sensitivity to outliers can lead to selection based on a few outliers rather than distributional shape such as bimodality. To address this type of non-robustness, the L -moments are considered. Using these in practice, however, has a limitation since they do not take zero values at the Gaussian distributions to which the shape of a marginal distribution is most naturally compared. As a remedy, Gaussian Centered L-moments are proposed, which share advantages of the L-moments, but have zeros at the Gaussian distributions. The strength of Gaussian Centered L-moments over other conventional moments is shown in theoretical and practical aspects such as their performances in screening important genes in cancer genetics data.(c) 2022 Elsevier B.V. All rights reserved.