Skip to main content
Log in

A Computational Approach to Gene Expression Data Extraction and Analysis

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

The rapid advancement of DNA microarray technology has revolutionalized genetic research in bioscience. Due to the enormous amount of gene expression data generated by such technology, computer processing and analysis of such data has become indispensable. In this paper, we present a computational framework for the extraction, analysis and visualization of gene expression data from microarray experiments. A novel, fully automated, spot segmentation algorithm for DNA microarray images, which makes use of adaptive thresholding, morphological processing and statistical intensity modeling, is proposed to: (i) segment the blocks of spots, (ii) generate the grid structure, and (iii) to segment the spot within each subregion. For data analysis, we propose a binary hierarchical clustering (BHC) framework for the clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the fuzzy C-means algorithm and the average linkage hierarchical clustering algorithm are used to split the data into two classes. Secondly, the Fisher linear discriminant analysis is applied to the two classes to assess whether the split is acceptable. The BHC algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known in advance. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation for effective visualization of gene expressions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S.K. Moore, “Making Chips to Probe Genes,” IEEE Spectrum, March 2001, pp. 54–60.

  2. D.J. Lockhart and E.A. Winzeler, “Genomics, Gene Expression and DNA Arrays,” Nature, vol. 405, 2000, pp. 827–846.

    Article  Google Scholar 

  3. A. Brazma and J. Vilo, “Minireview: Gene Expression Data Analysis,” European Molecular Biology Laboratory, Outstation Hinxton—The European Bioinformatics Institute, Cambridge CB10 ISD UK, 2000.

    Google Scholar 

  4. N.S. Halter, A. Maritan, M. Cieplak, N.V. Fedoroff, and J.R. Bamavar, “Dynamic Modelling of Gene Expression Data,” Department of Physic and Center for Materials Physics, and Department of Biology and the Life Sciences Consortium, Penndylvania State University, University Park, PA 16802, 2000.

    Google Scholar 

  5. U. Alon, N. Barkai, D.A. Notterman, K. Gish, S. Ybarra, D. Mack, and A.J. Levine, “Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays,” in Proceedings of the National Academy of Science USA, 1999, vol. 96, pp. 6745–6750.

    Article  Google Scholar 

  6. C.M. Perou, S.S. Jeffrey, M. van de Rijn, C.A. Rees, M.B. Eisen, D.T. Ross, A. Pergamenschikov, C.F. Williams, S.X. Zhu, J.C.F. Lee, D. Lashkari, D. Shalon, P.O. Brown, and D. Botstein, “Distinctive Gene Expression Patterns in Human Mammary Epithelial Cells and Breast Cancers,” in Proceedings of the National Academy of Science USA, 1999, vol. 96, pp. 9212–9217.

    Article  Google Scholar 

  7. K.P. White, S.A. Rifkin, P. Hurban, and D.S. Hogness, “Microarray Analysis of Drosophila Development during Metamorphosis,” Science,vol. 286, 1999, pp. 2179–2184.

    Article  Google Scholar 

  8. K.Y. Yeung and W.L. Ruzzo, “Principal Component Analysis for Clustering Gene Expression Data,” Bioinformatics,vol. 17, no. 9, 2001, pp. 763–774.

    Article  Google Scholar 

  9. C. Tang, L. Zhang, and A. Zhang, “Interactive Visualization and Analysis for Gene Expression Data,” in IEEE Proceedings of the Hawaii International Conference on System Sciences. Big Island, HI, Jan. 2002, vol. 6, pp. 143–166.

    Google Scholar 

  10. M. Eisen, Cluster and TreeView Manual. Stanford University, 1998/1999.

  11. M. Eisen, ScanAlyze User Manual, Stanford University, 1999, http://rana.lbl.gov/EisenSoftware.htm.

  12. Axon Instruments Inc. GenePix Pro 3.0, 2001.

  13. Packard BioChip Technologies, LLC, QuantArray Mocroarray Analysis Software.

  14. Yale Microarray Database, Yale University, “Direction for Using Quantarray Microarray Analysis Software,” http://info.med.yale.edu/microarray/quantarray.pdf.

  15. J. Buhler, T. Ideker, and D. Haynor, “Dapple: Improved Techniques for Finding Spots on DNA Microarrays,” Technical Report UWTR 2000-08-05, University of Washington.

  16. M. Buckley, “The Spot User's Guide,” CSIRO Mathematical and Information Sciences, Australia, http://www.cmis.csiro.au/iap/spot.htm.

  17. Y. Chen, E.R. Dougherty, and M.L. Bittner, “Ratio-Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics,vol. 2, 1997, pp. 364–374. http://www.nhgri.nih.gov/DIR/LCG/15K/HTML/img analysis.html.

    Article  Google Scholar 

  18. X. Wang, S. Ghosh, and S.W. Guo, “Quantitative Quality Control in Microarray Image Processing and Data Acquisition,” Nucleic Acids Research,vol. 29, no. 15, 2001, p. e75.

    Article  Google Scholar 

  19. J. Barrera, D.O. Dantas, R.F. Hashimoto, and R. Hirata, Jr., “Microarray Gridding by Mathematical Morphology,” in Proceedings of XIV Brazilian Symposium on Computer Graphics and Image Processing 2001, Oct. 2001, pp. 112–119.

  20. R. Baumgartner, S. Booth, and C. Bowman, “Automated Analysis of Gene-Microarray Images,” in Proceedings of Canadian Conference on Electrical and Computer Engineering IEEE CCECE 2002, 2002, vol. 2, pp. 1140–1144.

    Google Scholar 

  21. A.K. Jain and R.C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ, 1988.

    MATH  Google Scholar 

  22. E.R. Dougherty, An Introduction to Morphological Image Processing. SPIE-The International Society for Optical Engineering, 1992.

  23. Y. Chen, E.R. Dougherty, and M.L. Bittner, “Ratio Based Decisions and the Quantitative Analysis of cDNA Microarray Images,” Journal of Biomedical Optics,vol. 2, 1997, pp. 364–374.

    Article  Google Scholar 

  24. R.D. Wolfinger, G. Gibson, E.D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, and R.S. Paules, “Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models,” Journal of Computational Biology,vol. 8, 2001, pp. 625–637.

    Article  Google Scholar 

  25. Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, J. Ngai, and T.P. Speed, “Normalization for cDNA Microarray Data: A Robust Composite Method Addressing Single and Multiple Slide Systematic Variation,” Nucleic Acids Research,vol. 30, no. 4, 2002, p. e15.

    Article  Google Scholar 

  26. A.W.C. Liew, H. Yan, and M. Yang, “Robust Adaptive Spot Segmentation of DNA Microarray Images,” Pattern Recognition, vol. 36, 2003, pp. 1251–1254.

    Article  Google Scholar 

  27. S. Chu, J. DeRisi, M. Eisen, J. Mulholland, D. Botstein, P.O. Brown, and I. Herskowitz, “The Transcriptional Program of Sporulation in Budding Yeast,” Science, no. 282, 1998, pp. 699–705.

    Article  MATH  Google Scholar 

  28. T. Chen, V. Filkov, and S.S. Skiena, “Identifying Gene Regulatory Networks from Experimental Data,” in Proceedings of the Third Annual International Conference on Computational Molecular Biology RECOMB99, Lyon, France, March 1999, pp. 94–103.

    Chapter  Google Scholar 

  29. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, “Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring,” Science, vol. 286, no. 5439, 1999, pp. 531–537.

    Article  Google Scholar 

  30. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E.S. Lander, and T.R. Golub, “Interpreting Patterns of Gene Expression with Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation,” in Proceedings of the National Academy of Science USA, March 1999, vol. 96, pp. 2907–2912.

    Article  Google Scholar 

  31. D.A. Clausi, “K-Means Iterative Fisher (KIF) Unsupervised Clustering Algorithm Applied to Image Texture Segmentation,” Pattern Recognition,vol. 35, 2002, pp. 1959–1972.

    Article  Google Scholar 

  32. L.K. Szeto, A.W.C. Liew, H. Yan, and S.S. Tang, “Gene Expression Data Clustering and Visualization Using a Binary Hierarchical Clustering Framework,” in Proceedings of the First Asia-Pacific Bioinformatics Conference APBC2003, Adelaide, Australia, Feb. 2003, pp. 4–7.

    Google Scholar 

  33. L.K. Szeto, A.W.C. Liew, H. Yan, and S.S. Tang, “Gene Expression Data Clustering and Visualization Based on a Binary Hierarchical Clustering Framework,” Special Issue on Biomedical Visualization for Bioinformatics, Journal of Visual Languages and Computing,vol. 14, 2003, pp. 341–362.

    Article  Google Scholar 

  34. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flanery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, 2nd edition, 1992.

  35. G. Strang, Introduction To Linear Algebra.Wellesley-Cambridge Press, 1998.

  36. O. Alter, P.O. Brown, and D. Botstein, “Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling,” in Proceedings of the National Academy of Science USA, 2000, vol. 97, pp. 10101–10106.

    Article  Google Scholar 

  37. O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, Trevor Hastie, R. Tibshirani, D. Botsttein, and R.B. Altman, “Missing Value Estimation Methods for DNA Microarrays,” Bioinformatics, vol. 17, no. 6, 2001, pp. 520–525.

    Article  Google Scholar 

  38. A.W.C. Liew, S.H. Leung, and W.H. Lau, “Fuzzy Image Clustering Incorporating Spatial Continuity,” in IEE Proceedings-Vision, Image and Signal Processing, April 2000, vol. 147, pp. 185–192.

    Article  Google Scholar 

  39. J.C. Bezdek, Pattern Recognitionwith Fuzzy Objective Function. New York: Plenum Press, 1981.

  40. R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification. New York: Wiley-Interscience, 2001.

  41. P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein, and B. Futcher, “Comprehensive Identification of Cell Cycle-Regulated Genes of the Yeast Saccharomyces Cerevisiae by Microarray Hybridization,” Molecular Biology of the Cell,vol. 9, 1998, pp. 3273–3297.

    Article  Google Scholar 

  42. M.B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, “Cluster Analysis and Display of Genome-Wide Expression Patterns,” in Proceedings of the National Academy of Science USA, Dec. 1998, vol. 95, pp. 14863–14868.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liew, A.WC., Szeto, L.K., Tang, SS. et al. A Computational Approach to Gene Expression Data Extraction and Analysis. The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology 38, 237–258 (2004). https://doi.org/10.1023/B:VLSI.0000042490.35986.84

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:VLSI.0000042490.35986.84

Navigation