Skip to main content

Clustering of Multiple Microarray Experiments Using Information Integration

  • Conference paper
Information Technology in Bio- and Medical Informatics (ITBAM 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6865))

Abstract

In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  2. Boeva, V., Kostadinova, E.: A Hybrid DTW based method for integration analysis of time series data. In: ICAIS 2009, Austria, pp. 49–54 (2009)

    Google Scholar 

  3. Boeva, V., Kostadinova, E.: An Adaptive Approach for Integration Analysis of Multiple Gene Expression Datasets. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 221–230. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Boeva, V., Tsiporkova, E.: A Multi-purpose Time Series Data Standardization Method. In: Sgurev, V., Hadjiski, M., Kacprzyk, J. (eds.) Intelligent Systems: From Theory to Practice. SCI, vol. 299, pp. 445–460. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Choi, J.K., et al.: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19, i84–i90 (2003)

    Article  Google Scholar 

  6. Davidsson, P.: Coin Classification Using a Novel Technique for Learning Characteristic Decision Trees by Controlling the Degree of Generalization. In: Ninth International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, pp. 403–412. Gordon and Breach Science Publishers, New York (1996)

    Google Scholar 

  7. Gilks, W.R., Tom, B.D.M., Brazma, A.: Fusing microarray experiments with multivariate regression. Bioinformatics 21(2), ii137–ii143 (2005)

    Google Scholar 

  8. Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  9. Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 172(3), 107–145 (2001)

    Article  MATH  Google Scholar 

  10. Handl, J., et al.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)

    Article  Google Scholar 

  11. Havens, T.C., et al.: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and Gene Ontology annotations. In: North American Fuzzy Information Processing Society, pp. 1–6 (2008)

    Google Scholar 

  12. Hermans, F., Tsiporkova, E.: Merging microarray cell synchronization experiments through curve alignment. Bioinformatics 23, e64–e70 (2007)

    Article  Google Scholar 

  13. Hu, P., et al.: Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics 6, 128 (2005)

    Article  Google Scholar 

  14. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  15. Jain, A.K., Moreau, J.V.: Bootstrap technique in cluster analysis. Pattern Recognit. 20, 547–568 (1987)

    Article  Google Scholar 

  16. Kang, J., Yang, J., Xu, W., Chopra, P.: Integrating heterogeneous microarray data sources using correlation signatures. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 105–120. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  17. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)

    Google Scholar 

  18. Kustra, R., Zagdanski, A.: Incorporating Gene Ontology in Clustering Gene Expression Data. In: 19th IEEE Symposium on Computer-Based Medical Systems, pp. 555–563 (2006)

    Google Scholar 

  19. Lavesson, N., Davidsson, P.: A Multi-dimensional Measure Function for Classifier Performance. In: 2nd IEEE Internat. Conf. on Intelligent Systems, pp. 508–513. IEEE Press, Los Alamitos (2004)

    Google Scholar 

  20. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. Math. Stat. Prob., vol. 1, pp. 281–297 (1967)

    Google Scholar 

  21. Oliva, A., et al.: The cell cycle-regulated genes of Schizosaccharomyces pombe. PLOS 3(7), 1239–1260 (2005)

    Article  Google Scholar 

  22. Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational Applied Mathematics 20, 53–65 (1987)

    Article  MATH  Google Scholar 

  23. Rustici, G., et al.: Periodic gene expression program of the fission yeast cell cycle. Nat. Genetics 36, 809–817 (2004)

    Article  Google Scholar 

  24. Schena, M., et al.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)

    Article  Google Scholar 

  25. Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Mach. Learning Research 3, 583–617 (2002)

    MathSciNet  MATH  Google Scholar 

  26. Topchy, A., Jain, K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Machine Intelligence 27, 1866–1881 (2005)

    Article  Google Scholar 

  27. Troyanskaya, et al.: A Bayesian framework for combining heterogeneous data sources for gene function prediction (In S. cerevisiae). Genetics. PNAS 100, 8348–8353 (2003)

    Article  Google Scholar 

  28. Tsiporkova, E., Boeva, V.: Nonparametric Recursive Aggregation Process. Kybernetika. J. of the Czech Society for Cybernetics and Inf. Sciences 40(1), 51–70 (2004)

    MATH  Google Scholar 

  29. Tsiporkova, E., Boeva, V.: Two-pass imputation algorithm for missing value estimation in gene expression time series. JBCB 5(5), 1005–1022 (2007)

    Google Scholar 

  30. Tsiporkova, E., Boeva, V.: Fusing Time Series Expression Data through Hybrid Aggregation and Hierarchical Merge. Bioinformatics 24(16), i63–i69 (2008)

    Article  Google Scholar 

  31. Xiao, G., Pan, W.: Gene function prediction by a combined analysis of gene expression data and protein–protein interaction data. JBCB 3, 1371–1389 (2005)

    Google Scholar 

  32. Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kostadinova, E., Boeva, V., Lavesson, N. (2011). Clustering of Multiple Microarray Experiments Using Information Integration. In: Böhm, C., Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2011. Lecture Notes in Computer Science, vol 6865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23208-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23208-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23207-7

  • Online ISBN: 978-3-642-23208-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics