Clustering of Multiple Microarray Experiments Using Information Integration

Kostadinova, Elena; Boeva, Veselka; Lavesson, Niklas

doi:10.1007/978-3-642-23208-4_12

Elena Kostadinova²⁰,
Veselka Boeva²⁰ &
Niklas Lavesson²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6865))

Included in the following conference series:

International Conference on Information Technology in Bio- and Medical Informatics

572 Accesses
2 Citations

Abstract

In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Article Google Scholar
Boeva, V., Kostadinova, E.: A Hybrid DTW based method for integration analysis of time series data. In: ICAIS 2009, Austria, pp. 49–54 (2009)
Google Scholar
Boeva, V., Kostadinova, E.: An Adaptive Approach for Integration Analysis of Multiple Gene Expression Datasets. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 221–230. Springer, Heidelberg (2010)
Chapter Google Scholar
Boeva, V., Tsiporkova, E.: A Multi-purpose Time Series Data Standardization Method. In: Sgurev, V., Hadjiski, M., Kacprzyk, J. (eds.) Intelligent Systems: From Theory to Practice. SCI, vol. 299, pp. 445–460. Springer, Heidelberg (2010)
Chapter Google Scholar
Choi, J.K., et al.: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19, i84–i90 (2003)
Article Google Scholar
Davidsson, P.: Coin Classification Using a Novel Technique for Learning Characteristic Decision Trees by Controlling the Degree of Generalization. In: Ninth International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, pp. 403–412. Gordon and Breach Science Publishers, New York (1996)
Google Scholar
Gilks, W.R., Tom, B.D.M., Brazma, A.: Fusing microarray experiments with multivariate regression. Bioinformatics 21(2), ii137–ii143 (2005)
Google Scholar
Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 172(3), 107–145 (2001)
Article MATH Google Scholar
Handl, J., et al.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)
Article Google Scholar
Havens, T.C., et al.: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and Gene Ontology annotations. In: North American Fuzzy Information Processing Society, pp. 1–6 (2008)
Google Scholar
Hermans, F., Tsiporkova, E.: Merging microarray cell synchronization experiments through curve alignment. Bioinformatics 23, e64–e70 (2007)
Article Google Scholar
Hu, P., et al.: Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics 6, 128 (2005)
Article Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jain, A.K., Moreau, J.V.: Bootstrap technique in cluster analysis. Pattern Recognit. 20, 547–568 (1987)
Article Google Scholar
Kang, J., Yang, J., Xu, W., Chopra, P.: Integrating heterogeneous microarray data sources using correlation signatures. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 105–120. Springer, Heidelberg (2005)
Chapter Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Google Scholar
Kustra, R., Zagdanski, A.: Incorporating Gene Ontology in Clustering Gene Expression Data. In: 19th IEEE Symposium on Computer-Based Medical Systems, pp. 555–563 (2006)
Google Scholar
Lavesson, N., Davidsson, P.: A Multi-dimensional Measure Function for Classifier Performance. In: 2nd IEEE Internat. Conf. on Intelligent Systems, pp. 508–513. IEEE Press, Los Alamitos (2004)
Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. Math. Stat. Prob., vol. 1, pp. 281–297 (1967)
Google Scholar
Oliva, A., et al.: The cell cycle-regulated genes of Schizosaccharomyces pombe. PLOS 3(7), 1239–1260 (2005)
Article Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational Applied Mathematics 20, 53–65 (1987)
Article MATH Google Scholar
Rustici, G., et al.: Periodic gene expression program of the fission yeast cell cycle. Nat. Genetics 36, 809–817 (2004)
Article Google Scholar
Schena, M., et al.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)
Article Google Scholar
Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Mach. Learning Research 3, 583–617 (2002)
MathSciNet MATH Google Scholar
Topchy, A., Jain, K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Machine Intelligence 27, 1866–1881 (2005)
Article Google Scholar
Troyanskaya, et al.: A Bayesian framework for combining heterogeneous data sources for gene function prediction (In S. cerevisiae). Genetics. PNAS 100, 8348–8353 (2003)
Article Google Scholar
Tsiporkova, E., Boeva, V.: Nonparametric Recursive Aggregation Process. Kybernetika. J. of the Czech Society for Cybernetics and Inf. Sciences 40(1), 51–70 (2004)
MATH Google Scholar
Tsiporkova, E., Boeva, V.: Two-pass imputation algorithm for missing value estimation in gene expression time series. JBCB 5(5), 1005–1022 (2007)
Google Scholar
Tsiporkova, E., Boeva, V.: Fusing Time Series Expression Data through Hybrid Aggregation and Hierarchical Merge. Bioinformatics 24(16), i63–i69 (2008)
Article Google Scholar
Xiao, G., Pan, W.: Gene function prediction by a combined analysis of gene expression data and protein–protein interaction data. JBCB 3, 1371–1389 (2005)
Google Scholar
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Branch Plovdiv, Computer Systems and Technologies Department, Technical University of Sofia, 4400, Plovdiv, Bulgaria
Elena Kostadinova & Veselka Boeva
School of Computing, Blekinge Institute of Technology, SE-371 79, Karlskrona, Sweden
Niklas Lavesson

Authors

Elena Kostadinova
View author publications
You can also search for this author in PubMed Google Scholar
Veselka Boeva
View author publications
You can also search for this author in PubMed Google Scholar
Niklas Lavesson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Ludwig-Maximilians-Universität, Oettingenstrasse 67, 80538, München, Germany
Christian Böhm
Department of Computer Science, San José State University, One Washington Square, 95192-0249, San José, CA, U.S.A.
Sami Khuri
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University, Technicka 2, 166 27, Prague 6, Czech Republic
Lenka Lhotská
Dipartimento di Informatica, Università di Pisa, Italy
Nadia Pisanti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kostadinova, E., Boeva, V., Lavesson, N. (2011). Clustering of Multiple Microarray Experiments Using Information Integration. In: Böhm, C., Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2011. Lecture Notes in Computer Science, vol 6865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23208-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-23208-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23207-7
Online ISBN: 978-3-642-23208-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics