A1 Vertaisarvioitu alkuperäisartikkeli tieteellisessä lehdessä
Reproducibility-optimized test statistic for ranking genes in microarray studies
Tekijät: Elo LL, Filen S, Lahesmaa R, Aittokallio T
Kustantaja: IEEE COMPUTER SOC
Julkaisuvuosi: 2008
Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics
Tietokannassa oleva lehden nimi: IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
Lehden akronyymi: IEEE ACM T COMPUT BI
Vuosikerta: 5
Numero: 3
Aloitussivu: 423
Lopetussivu: 431
Sivujen määrä: 9
ISSN: 1545-5963
DOI: https://doi.org/10.1109/TCBB.2007.1078
Tiivistelmä
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. Although previous studies on simulated or spike-in data sets do not provide practical guidance on how to choose the best method for a given real data set, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene-ranking statistic directly from the data. In comparison with existing ranking methods, the reproducibility-optimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in data set. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given data set without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibility-optimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.
A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. Although previous studies on simulated or spike-in data sets do not provide practical guidance on how to choose the best method for a given real data set, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene-ranking statistic directly from the data. In comparison with existing ranking methods, the reproducibility-optimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in data set. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given data set without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibility-optimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.