Abstract
In this article, we study two microarray data integration techniques and describe how they can be applied and validated on a set of independent, but biologically related, microarray data sets in order to derive consistent and relevant clustering results. First, we present a cluster integration approach, which combines the information containing in multiple data sets at the level of expression or similarity matrices, and then applies a clustering algorithm on the combined matrix for subsequent analysis. Second, we propose a technique for the integration of multiple partitioning results. The performance of the proposed cluster integration algorithms is evaluated on time series expression data using two clustering algorithms and three cluster validation measures. We also propose a modified version of the Figure of Merit (FOM) algorithm, which is suitable for estimating the predictive power of clustering algorithms when they are applied to multiple expression data sets. In addition, an improved version of the well-known connectivity measure is introduced to achieve a more objective evaluation of the connectivity performance of clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Boeva, V., Kostadinova, E.: A Hybrid DTW based method for integration analysis of time series data. In: ICAIS 2009, Austria, pp. 49–54 (2009)
Boeva, V., Kostadinova, E.: An Adaptive Approach for Integration Analysis of Multiple Gene Expression Datasets. In: Dicheva, D., Dochev, D. (eds.) AIMSA 2010. LNCS, vol. 6304, pp. 221–230. Springer, Heidelberg (2010)
Boeva, V., Tsiporkova, E.: A Multi-purpose Time Series Data Standardization Method. In: Sgurev, V., Hadjiski, M., Kacprzyk, J. (eds.) Intelligent Systems: From Theory to Practice. SCI, vol. 299, pp. 445–460. Springer, Heidelberg (2010)
Choi, J.K., et al.: Combining multiple microarray studies and modeling interstudy variation. Bioinformatics 19, i84–i90 (2003)
Davidsson, P.: Coin Classification Using a Novel Technique for Learning Characteristic Decision Trees by Controlling the Degree of Generalization. In: Ninth International Conference on Industrial & Engineering Applications of Artificial Intelligence & Expert Systems, pp. 403–412. Gordon and Breach Science Publishers, New York (1996)
Gilks, W.R., Tom, B.D.M., Brazma, A.: Fusing microarray experiments with multivariate regression. Bioinformatics 21(2), ii137–ii143 (2005)
Golub, T., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. Journal of Intelligent Information Systems 172(3), 107–145 (2001)
Handl, J., et al.: Computational cluster validation in post-genomic data analysis. Bioinformatics 21, 3201–3212 (2005)
Havens, T.C., et al.: Fuzzy cluster analysis of bioinformatics data composed of microarray expression data and Gene Ontology annotations. In: North American Fuzzy Information Processing Society, pp. 1–6 (2008)
Hermans, F., Tsiporkova, E.: Merging microarray cell synchronization experiments through curve alignment. Bioinformatics 23, e64–e70 (2007)
Hu, P., et al.: Integrative analysis of multiple gene expression profiles with quality-adjusted effect size models. BMC Bioinformatics 6, 128 (2005)
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, Englewood Cliffs (1988)
Jain, A.K., Moreau, J.V.: Bootstrap technique in cluster analysis. Pattern Recognit. 20, 547–568 (1987)
Kang, J., Yang, J., Xu, W., Chopra, P.: Integrating heterogeneous microarray data sources using correlation signatures. In: Ludäscher, B., Raschid, L. (eds.) DILS 2005. LNCS (LNBI), vol. 3615, pp. 105–120. Springer, Heidelberg (2005)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI (1995)
Kustra, R., Zagdanski, A.: Incorporating Gene Ontology in Clustering Gene Expression Data. In: 19th IEEE Symposium on Computer-Based Medical Systems, pp. 555–563 (2006)
Lavesson, N., Davidsson, P.: A Multi-dimensional Measure Function for Classifier Performance. In: 2nd IEEE Internat. Conf. on Intelligent Systems, pp. 508–513. IEEE Press, Los Alamitos (2004)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. Math. Stat. Prob., vol. 1, pp. 281–297 (1967)
Oliva, A., et al.: The cell cycle-regulated genes of Schizosaccharomyces pombe. PLOS 3(7), 1239–1260 (2005)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational Applied Mathematics 20, 53–65 (1987)
Rustici, G., et al.: Periodic gene expression program of the fission yeast cell cycle. Nat. Genetics 36, 809–817 (2004)
Schena, M., et al.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470 (1995)
Strehl, A., Ghosh, J.: Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions. Journal of Mach. Learning Research 3, 583–617 (2002)
Topchy, A., Jain, K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Machine Intelligence 27, 1866–1881 (2005)
Troyanskaya, et al.: A Bayesian framework for combining heterogeneous data sources for gene function prediction (In S. cerevisiae). Genetics. PNAS 100, 8348–8353 (2003)
Tsiporkova, E., Boeva, V.: Nonparametric Recursive Aggregation Process. Kybernetika. J. of the Czech Society for Cybernetics and Inf. Sciences 40(1), 51–70 (2004)
Tsiporkova, E., Boeva, V.: Two-pass imputation algorithm for missing value estimation in gene expression time series. JBCB 5(5), 1005–1022 (2007)
Tsiporkova, E., Boeva, V.: Fusing Time Series Expression Data through Hybrid Aggregation and Hierarchical Merge. Bioinformatics 24(16), i63–i69 (2008)
Xiao, G., Pan, W.: Gene function prediction by a combined analysis of gene expression data and protein–protein interaction data. JBCB 3, 1371–1389 (2005)
Yeung, K.Y., Haynor, D.R., Ruzzo, W.L.: Validating clustering for gene expression data. Bioinformatics 17(4), 309–318 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kostadinova, E., Boeva, V., Lavesson, N. (2011). Clustering of Multiple Microarray Experiments Using Information Integration. In: Böhm, C., Khuri, S., Lhotská, L., Pisanti, N. (eds) Information Technology in Bio- and Medical Informatics. ITBAM 2011. Lecture Notes in Computer Science, vol 6865. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23208-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-23208-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23207-7
Online ISBN: 978-3-642-23208-4
eBook Packages: Computer ScienceComputer Science (R0)