Yearb Med Inform 2006; 15(01): 91-103
DOI: 10.1055/s-0038-1638484
Survey
Georg Thieme Verlag KG Stuttgart

Section 7: Bioinformatics: Computational Approaches to Analysis of DNA Microarray Data

J. Quackenbush
1   Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
07 March 2018 (online)

Summary

Objectives

To review the current state of the art in computational methods for the analysis of DNA microarray data.

Methods

The review considers methods of microarray data collection, transformation and representation, comparisons and predictions of gene expression from the data, their mechanistic analysis, related systems biology, and the application of clustering techniques.

Results

Functional genomics approaches have greatly increased the rate at which data on biological systems is generated, leading to corresponding challenges in analyzing the data through advanced computational techniques . The paper compares and contrasts the application of computational clustering for discovery, comparison, and prediction of gene expression classes, together with their evaluation and relation to mechanistic analyses of biological systems.

Conclusion

Methods for assaying gene expression levels by DNA microarray experiments produce considerably more data than other techniques, and require a wide variety of computational techniques for identifying patterns of expression that may be biologically significant. These will have to be verified and validated by comparison to results from other methods, integrated with other systems data, and provide the feedback for further experimentation for testing mechanistic or other biological hypotheses.

 
  • References

  • 1 Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003; 31 (04) e15.
  • 2 Quackenbush J. Microarray data normalization and transformation. Nat Genet 2002; 32 Suppl: 496-501.
  • 3 Schadt EE, Li C, Ellis B, Wong WH. Feature extraction and normalization algorithms for highdensity oligonucleotide gene expression array data. J Cell Biochem 2001; Suppl 37: 120-5.
  • 4 Yang IV, Chen E, Hasseman JP, Liang W, Frank BC, Wang S. et al. Within the fold: assessing differential expression measures and reproducibility in microarray assays. Genome Biol. 2002 03. 11 research0062.
  • 5 Yang YH, Dudoit S, Luu P, Lin DM, Peng V, Ngai J. et al. Normalizatoin for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res 2002; 30 (04) e15.
  • 6 Hoffmann R, Seidl T, Dugas M. Profound effect of normalization on detection of differentially expressed genes in oligonucleotide microarray data analysis. Genome Biol. 2002 03. 07 RESEARCH0033.
  • 7 Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 1998; 95 (25) 14863-8.
  • 8 Weinstein JN, Myers TG, O’Connor PM, Friend SH, Fornace Jr AJ, Kohn KW. et al. An information-intensive approach to the molecular pharmacology of cancer. Science 1997; 275: 343-9.
  • 9 Wen X, Fuhrman S, Michaels GS, Carr DB, Smit S, Barker JL. et al. Large-scale temporal gene expression mapping of central nervous system development. Proc Natl Acad Sci U S A 1998; 95: 334-9.
  • 10 Soukas A, Cohen P, Socci ND, Friedman JM. Leptin-specific patterns of gene expression in white adipose tissue. Genes Dev 2000; 14 (08) 963-80.
  • 11 Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E. et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA 1999; 96 (06) 2907-12.
  • 12 Toronen P, Kolehmainen M, Wong G, Castren E. Analysis of gene expression data using selforganizing maps. FEBS Lett 1999; 451 (02) 142-6.
  • 13 Wang J, Delabie J, Aasheim H, Smeland E, Myklebost O. Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinformatics 2002; 03: 36.
  • 14 Herrero J, Valencia A, Dopazo J. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 2001; 17 (02) 126-36.
  • 15 Butte AJ, Kohane IS. Unsupervised knowledge discovery in medical databases using relevance networks. Proc AMIA Symp 1999; 711-5.
  • 16 Kim SK, Lund J, Kiraly M, Duke K, Jiang M, Stuart JM. et al. A gene expression map for Caenorhabditis elegans. Science 2001; 293: 2087-92.
  • 17 Raychaudhuri S, Stuart JM, Altman RB. (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pac Symp Biocomput 2000; 455-66.
  • 18 Bammler T, Beyer RP, Bhattacharya S, Boorman GA, Boyles A, Bradford BU. et al. Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2005; 02 (06) 351-6.
  • 19 Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Jacobson JW. et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res 2005; 11: 565-72.
  • 20 Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC. et al. Multiple-laboratory comparison of microarray platforms. Nat Methods 2005; 02 (05) 345-50.
  • 21 Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. Independence and reproducibility across microarray platforms. Nat Methods 2005; 02 (05) 337-44.
  • 22 Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98 (09) 5116-21.
  • 23 Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP. et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999; 286 5439 531-7.
  • 24 Bloom G, Yang IV, Boulware D, Kwong KY, Coppola D, Eschrich S. et al. Multi-platform, multisite, microarray-based human tumor classification. Am J Pathol 2004; 164 (01) 9-16.
  • 25 Ellis M, Davis N, Coop A, Liu M, Schumaker L, Lee RY. et al. Development and validation of a method for using breast core needle biopsies for gene expression microarray analyses. Clin Cancer Res 2002; 08 (05) 1155-66.
  • 26 Antoniadis A, Lambert-Lacroix S, Leblanc F. Effective dimension reduction methods for tumor classification using gene expression data. Bioinformatics 2003; 19 (05) 563-70.
  • 27 Le QT, Sutphin PD, Raychaudhuri S, Yu SC, Terris DJ, Lin HS. et al. Identification of osteopontin as a prognostic plasma marker for head and neck squamous cell carcinomas. Clin Cancer Res 2003; 09 (01) 59-67.
  • 28 Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics 2002; 18 (01) 39-50.
  • 29 Orr MS, Scherf U. Large-scale gene expression analysis in molecular target discovery. Leukemia 2002; 16 (04) 473-7.
  • 30 Boulesteix AL, Tutz G, Strimmer K. A CARTbased approach to discover emerging patterns in microarray data. Bioinformatics 2003; 19 (18) 2465-72.
  • 31 Brown MP, Grundy WN, Lin D, Cristianini N, Sugnet CW, Furey TS. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A 2000; 97 (01) 262-7.
  • 32 Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M. et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci U S A 2001; 98 (26) 15149-54.
  • 33 Theilhaber J, Connolly T, Roman-Roman S, Bushnell S, Jackson A, Call K. et al. Finding genes in the C2C12 osteogenic pathway by k-nearestneighbor classification of expression data. Genome Res 2002; 12 (01) 165-76.
  • 34 Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conkli BR. MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol 2003; 04 (01) R7.
  • 35 Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M. et al. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 2003; 04 (04) R28.
  • 36 Hosack DA, Denni Jr G, Sherma BT, Lane HC, Lempicki RA. Identifying biological themes within lists of genes with EASE. Genome Biol 2003; 04 (10) R70.
  • 37 Akutsu T, Miyano S, Kuhara S. Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function. J Comput Biol 2000; 07 3-4 331-43.
  • 38 Savoie CJ, Aburatani S, Watanabe S, Eguchi Y, Muta S, Imoto S, Miyano S. et al. Use of gene networks from full genome microarray libraries to identify functionally relevant drug-affected genes and gene regulation cascades. DNA Res 2003; 10 (01) 19-25.
  • 39 Soinov LA. Supervised classification for gene network reconstruction. Biochem Soc Trans 2003; 31 Pt 6 1497-502.
  • 40 Datta A, Choudhary A, Bittner ML, Dougherty ER. External control in Markovian genetic regulatory networks: the imperfect information case. Bioinformatics 2004; 20 (06) 924-30.
  • 41 Hashimoto RF, Kim S, Shmulevich I, Zhang W, Bittner ML, Dougherty ER. Growing genetic regulatory networks from seed genes. Bioinformatics 2004; 20 (08) 1241-7.
  • 42 Shmulevich I, Dougherty ER, Kim S, Zhang W. Probabilistic Boolean Networks: a rule-based uncertainty model for gene regulatory networks. Bioinformatics 2002; 18 (02) 261-74.
  • 43 Shmulevich I, Dougherty ER, Zhang W. Gene perturbation and intervention in probabilistic Boolean networks. Bioinformatics 2002; 18 (10) 1319-31.
  • 44 Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol 2000; 07 3-4 601-20.
  • 45 Imoto S, Kim S, Goto T, Miyano S, Aburatani S, Tashiro K, Kuhara S. Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. J Bioinform Comput Biol 2003; 01 (02) 231-52.
  • 46 Tamada Y, Kim S, Bannai H, Imoto S, Tashiro K, Kuhara S. et al. Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection. Bioinformatics 2003; 19 Suppl 2: II227-36.
  • 47 Zou M, Conzen SD. A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 2005; 21 (01) 71-9.
  • 48 Famili I, Mahadevan R, Palsson BO. k-Cone analysis: determining all candidate values for kinetic parameters on a network scale. Biophys J 2005; 88 (03) 1616-25.
  • 49 Wiback SJ, Mahadevan R, Palsson BO. Using metabolic flux data to further constrain the metabolic solution space and predict internal flux patterns: the Escherichia coli spectrum. Biotechnol Bioeng 2004; 86 (03) 317-31.
  • 50 Michaels GS, Carr DB, Askenazi M, Fuhrman S, Wen X, Somogy R. Cluster analysis and data visualization of large-scale gene expression data. Pac Symp Biocomput 1998; 42-53.
  • 51 Aronow BJ, Toyokawa T, Canning A, Haghighi K, Delling U, Kranias E. et al. Divergent transcriptional responses to independent genetic causes of cardiac hypertrophy. Physiol Genomics 2001; 06 (01) 19-28.
  • 52 Fink JL, Drewes S, Patel H, Welsh JB, Masys DR, Corbeil J. et al. 2HAPI: a microarray data analysis system. Bioinformatics 2003; 19 (11) 1443-5.
  • 53 Jenssen TK, Laegreid A, Komorowski J, Hovig E. A literature network of human genes for highthroughput analysis of gene expression. Nat Genet 2001; 28 (01) 21-28.
  • 54 Masys DR, Welsh JB, Lynn JFink, Gribskov M, Klacansky I, Corbeil J. Use of keyword hierarchies to interpret gene expression patterns. Bioinformatics 2001; 17 (04) 319-26.
  • 55 Cook DN, Wang S, Howles GP, Speer M, Churchill G, Quackenbush J. et al. The genetics of innate immunity in the lung. Chest 2003; 123 3 Suppl 369S.
  • 56 Doerge RW. Mapping and analysis of quantitative trait loci in experimental populations. Nat Rev Genet 2002; 03 (01) 43-52.
  • 57 Kwitek-Black AE, Jacob HJ. The use of designer rats in the genetic dissection of hypertension. Curr Hypertens Rep 2001; 03 (01) 12-8.
  • 58 Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V. et al. Genetics of gene expression surveyed in maize, mouse and man. Nature 2003; 422 6929 297-302.
  • 59 Cheung ST, Chen X, Guan XY, Wong SY, Tai LS, Ng IO. et al. Identify metastasis-associated genes in hepatocellular carcinoma through clonality delineation for multinodular tumor. Cancer Res 2002; 62 (16) 4711-21.
  • 60 Gray JW, Collins C. Genome changes and gene expression in human solid tumors. Carcinogenesis 2000; 21 (03) 443-52.
  • 61 Chen H, Liu J, Zhao CQ, Diwan BA, Merrick BA, Waalkes MP. Association of c-myc overexpression and hyperproliferation with arsenite-induced malignant transformation. Toxicol Appl Pharmacol 2001; 175 (03) 260-8.
  • 62 Ehrlich M. DNA hypomethylation, cancer, the immunodeficiency, centromeric region instability, facial anomalies syndrome and chromosomal rearrangements. J Nutr 2002; 132 (8 Suppl): 2424S-2429S.
  • 63 Furey TS, Cristianini N, Duffy N, Bednarski DW, Schummer M, Haussler D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 2000; 16 (10) 906-14.
  • 64 van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M. et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415 6871 530-6.
  • 65 Baggerly KA, Coombes KR, Hess KR, Stivers DN, Abruzzo LV, Zhang W. Identifying differentially expressed genes in cDNA microarray experiments. J Comput Biol 2001; 08 (06) 639-59.
  • 66 Long AD, Mangalam HJ, Chan BY, Tolleri L, Hatfield GW, Baldi P. Improved statistical inference from DNA microarray data using analysis of variance and a Bayesian statistical framework. Analysis of global gene expression in Escherichia coli K12. J Biol Chem 2001; 276 (23) 19937-44.
  • 67 Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003; 13 (11) 2498-504.