Abstract
Given the complexity of microarray-based gene expression studies, guidelines encourage transparent design and public data availability. Several journals require public data deposition and several public databases exist. However, not all data are publicly available, and even when available, it is unknown whether the published results are reproducible by independent scientists. Here we evaluated the replication of data analyses in 18 articles on microarray-based gene expression profiling published in Nature Genetics in 2005–2006. One table or figure from each article was independently evaluated by two teams of analysts. We reproduced two analyses in principle and six partially or with some discrepancies; ten could not be reproduced. The main reason for failure to reproduce was data unavailability, and discrepancies were mostly due to incomplete data annotation or specification of data processing and analysis. Repeatability of published microarray studies is apparently limited. More strict publication rules enforcing public data availability and explicit description of data processing and analysis should be considered.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Schena, M. Microarray analysis. (John Wiley & Sons, Hoboken, New Jersey, 2003).
Brazma, A. et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Allison, D.B., Cui, X., Page, G.P. & Sabripour, M. Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006).
Shi, L. et al. The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 24, 1151–1161 (2006).
Anonymous. Minimum compliance for a microarray experiment? Nat. Genet. 38, 1089 (2006).
Ball, C.A. et al. Submission of microarray data to public repositories. PLoS Biol. 2, e317 (2004).
Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization assay repository. Nucleic Acids Res. 30, 207–210 (2002).
Brazma, A. et al. Array Express – a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71 (2003).
Larsson, O. & Sandberg, R. Lack of correct data format and comparability limits future integrative microarray research. Nat. Biotechnol. 24, 1322–1323 (2006).
Dupuy, A. & Simon, R.M. Critical review of published microarray studies for cancer outcome and guidelines on statistical analysis and reporting. J. Natl. Cancer Inst. 99, 147–157 (2007).
Ioannidis, J.P., Polyzos, N.P. & Trikalinos, T.A. Selective discussion and transparency in microarray research findings for cancer outcomes. Eur. J. Cancer 43, 1999–2010 (2007).
International Committee of Medical Journal Editors. Uniform Requirements for Manuscripts Submitted to Biomedical Journals: Writing and Editing for Biomedical Publication. <http://www.icmje.org/#prepare> (2008).
Ioannidis, J.P. Molecular evidence-based medicine: evolution and integration of information in the genomic era. Eur. J. Clin. Invest. 37, 340–349 (2007).
Ingraham, C.R. et al. Abnormal skin, limb and craniofacial morphogenesis in mice deficient for interferon regulatory factor 6 (Irf6). Nat. Genet. 38, 1335–1340 (2006).
Carroll, J.S. et al. Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 38, 1289–1297 (2006).
Dierick, H.A. & Greenspan, R.J. Molecular analysis of flies selected for aggressive behavior. Nat. Genet. 38, 1023–1031 (2006).
Pickersgill, H. et al. Characterization of the Drosophila melanogaster genome at the nuclear lamina. Nat. Genet. 38, 1005–1014 (2006).
Tirosh, I., Weinberger, A., Carmi, M. & Barkai, N. A genetic signature of interspecies variations in gene expression. Nat. Genet. 38, 830–834 (2006).
Loh, Y.H. et al. The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells. Nat. Genet. 38, 431–440 (2006).
Malek, R.L. et al. Physiogenomic resources for rat models of heart, lung and blood disorders. Nat. Genet. 38, 234–239 (2006).
Mehrabian, M. et al. Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits. Nat. Genet. 37, 1224–1233 (2005).
Mito, Y., Henikoff, J.G. & Henikoff, S. Genome-scale profiling of histone H3.3 replacement patterns. Nat. Genet. 37, 1090–1097 (2005).
Gupta, P.B. et al. The melanocyte differentiation program predisposes to metastasis after neoplastic transformation. Nat. Genet. 37, 1047–1054 (2005).
Frey, B.J. et al. Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs. Nat. Genet. 37, 991–996 (2005).
Ule, J. et al. Nova regulates brain-specific splicing to shape the synapse. Nat. Genet. 37, 844–852 (2005).
Schadt, E.E. et al. An integrative genomics approach to infer causal associations between gene expression and disease. Nat. Genet. 37, 710–717 (2005).
Denver, D.R. et al. The transcriptional consequences of mutation and natural selection in Caenorhabditis elegans. Nat. Genet. 37, 544–548 (2005).
Van Driessche, N. et al. Epistasis analysis with global transcriptional phenotypes. Nat. Genet. 37, 471–477 (2005).
Schmid, M. et al. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506 (2005).
Tanaka, H., Bergstrom, D.A., Yao, M.C. & Tapscott, S.J. Widespread and nonrandom distribution of DNA palindromes in cancer cells provides a structural platform for subsequent gene amplification. Nat. Genet. 37, 320–327 (2005).
Roepman, P. et al. An expression profile for diagnosis of lymph node metastases from primary head and neck squamous cell carcinomas. Nat. Genet. 37, 182–186 (2005).
Sweet-Cordero, A. et al. An oncogenic KRAS2 expression signature identified by cross-species gene-expression analysis. Nat. Genet. 37, 48–55 (2005).
Oleksiak, M.F., Roach, J.L. & Crawford, D.L. Natural variation in cardiac metabolism and gene expression in Fundulus heteroclitus. Nat. Genet. 37, 67–72 (2005).
Larkin, J.E. et al. Independence and reproducibility across microarray platforms. Nat. Methods 2, 337–344 (2005).
Chen, J.J. et al. Reproducibility of microarray data: a further analysis of microarray quality control (MAQC) data. BMC Bioinformatics 8, 412 (2007).
Miron, M. & Nadon, R. Inferential literacy for experimental high-throughput biology. Trends Genet. 22, 84–89 (2006).
Shields, R. MIAME, we have a problem. Trends Genet. 22, 65–66 (2006).
Draghici, S., Khatri, P., Eklund, A.C. & Szallasi, Z. Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 22, 101–109 (2006).
Piwowar, H.A., Day, R.S. & Fridsma, D.B. Sharing detailed research data is associated with increased citation rate. PLoS ONE. 2, e308 (2007).
Brazma, A. & Parkinson, S. ArrayExpress service for reviewers/editors of DNA microarray papers. Nat. Biotechnol. 24, 1321–1322 (2006).
Gentleman, R. Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4, 2 (2005).
Ioannidis, J.P.A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
Author information
Authors and Affiliations
Contributions
The protocol was designed with discussion among all authors. All authors except V.v.N. participated in evaluations of the eligible articles and their analyses. V.v.N. collected all the evaluations and examined if there were discrepancies among teams. J.P.A.I. wrote the manuscript, which was critically revised by all other coauthors. After the first author, the author order is alphabetical.
Corresponding author
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1 and 2, Supplementary Table 1 (PDF 323 kb)
Rights and permissions
About this article
Cite this article
Ioannidis, J., Allison, D., Ball, C. et al. Repeatability of published microarray gene expression analyses. Nat Genet 41, 149–155 (2009). https://doi.org/10.1038/ng.295
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.295
This article is cited by
-
DNMT3L inhibits hepatocellular carcinoma progression through DNA methylation of CDO1: insights from big data to basic research
Journal of Translational Medicine (2024)
-
Systematic comparison of transcriptomes of Caco-2 cells cultured under different cellular and physiological conditions
Archives of Toxicology (2023)
-
Is open science a double-edged sword?: data sharing and the changing citation pattern of Chinese economics articles
Scientometrics (2023)
-
Methods for Heart Rate Variability Biofeedback (HRVB): A Systematic Review and Guidelines
Applied Psychophysiology and Biofeedback (2023)
-
The importance of adherence to international standards for depositing open data in public repositories
BMC Research Notes (2021)