doi:10.1016/j.csda.2008.05.010
Copyright © 2008 Elsevier B.V. All rights reserved.
Assumption adequacy averaging as a concept for developing more robust methods for differential gene expression analysis
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Stan Poundsa,
,
and Shesh N. Raib, 
aDepartment of Biostatistics, St. Jude Children’s Research Hospital, 332 N. Lauderdale Street, Memphis, TN, 38105, USA
bDepartment of Bioinformatics & Biostatistics, University of Louisville, Louisville, KY 40202, USA
Available online 23 May 2008.
Abstract
The concept of assumption adequacy averaging is introduced as a technique for developing more robust methods that incorporate assessments of assumption adequacy into the analysis. The concept is illustrated by using it to develop a method that averages results from the t-test and nonparametric rank-sum test with weights obtained from using the Shapiro–Wilk test to test the assumption of normality. Through this averaging process, the proposed method is able to rely more heavily on the statistical test that the data suggests is superior for each individual gene. Subsequently, this method developed by assumption adequacy averaging outperforms its two component methods (the t-test and rank-sum test) in a series of traditional and bootstrap-based simulation studies. The proposed method showed greater concordance in gene selection across two studies of gene expression in acute myeloid leukemia than did the t-test or rank-sum test. An R routine for implementing the method is available from www.stjuderesearch.org/depts/biostats.
Fig. 1. Trend in average power as a function of η. The above plot gives the average power for the t-test (dashed line), the proposed method (solid line), and the rank-sum test (dotted line) observed in the set of simulations with 50% of genes differentially expressed with effect size 1.0 and sample size of 25.
Fig. 2. Trend in average power as a function of sample size. The above plot gives the average power for the t-test (dashed line), the proposed method (solid line), and the rank-sum test (dotted line) observed in the set of simulations with 50% of genes differentially expressed with effect size 1.0 and 10% of genes normally distributed. In this case, the rank-sum method is preferred because few genes satisfy the normality assumption of the t-test. The average power of the proposed AAA method approaches that of the rank-sum test as the sample size increases, consistently with the property described in Section 3.3.
Table 1.
Results of bootstrap-based simulations

Table 2.
Concordance of results by study and method(s) of analysis

The first item of each entry gives the number of probe sets that are overexpressed in t(8; 21) relative to MLL and are significant at EBP(g is DE) ≥ 0.90 in both studies when the indicated methods are applied. The second item of each entry gives the number of probe sets that are underexpressed in t(8; 21) relative to MLL and are significant at EBP(g is DE) ≥ 0.90 in both studies when the indicated methods are applied. For example, when St. Jude data are analyzed with the t-test and Stanford data are analyzed with the rank-sum test, there are 32 probe sets that are significant with EBP(g is DE) ≥ 0.90 and overexpressed in t(8; 21) in both studies and 41 probe sets that are significant with EBP(g is DE) ≥ 0.10 and underexpressed in t(8; 21) in both studies.

Corresponding author. Tel.: +1 901 495 5052; fax: +1 901 544 8843.
|
Note to users: The section "Articles in Press" contains peer reviewed accepted articles to be published in this journal. When the final article is assigned to an issue of the journal, the "Article in Press" version will be removed from this section and will appear in the associated published journal issue. The date it was first made available online will be carried over. Please be aware that although "Articles in Press" do not have all bibliographic details available yet, they can already be cited using the year of online publication and the DOI as follows: Author(s), Article Title, Journal (Year), DOI. Please consult the journal's reference style for the exact appearance of these elements, abbreviation of journal names and the use of punctuation.
|
| There are three types of "Articles in Press": |
- Accepted manuscripts: these are articles that have been peer reviewed and accepted for publication by the Editorial Board. The articles have not yet been copy edited and/or formatted in the journal house style.
- Uncorrected proofs: these are copy edited and formatted articles that are not yet finalized and that will be corrected by the authors. Therefore the text could change before final publication.
- Corrected proofs: these are articles containing the authors' corrections and may, or may not yet have specific issue and page numbers assigned.
|