Bi-clustering Gene Expression Data Using Co-similarity

Hussain, Syed Fawad

doi:10.1007/978-3-642-25853-4_15

Bi-clustering Gene Expression Data Using Co-similarity

Syed Fawad Hussain²²

Conference paper

968 Accesses
10 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7120))

Abstract

We propose a new framework for bi-clustering gene expression data that is based on the notion of co-similarity between genes and samples. Our work is based on a co-similarity based framework that iteratively learns similarity between rows using similarity between columns and vice-versa in a matrix. The underlying concept, which is usually referred to as bi-clustering in the domain of bioinformatics, aims to find groupings of the feature set that exhibit similar behavior across sample subsets. The algorithm has previously been shown to work well for document clustering in a sparse matrix representation. We propose a variation of the method suited for analyzing data that is represented as a dense matrix and is non-homogenous as is the case in gene expression. Our experiments show that, with the proposed variations, the method is well suited for finding bi-clusters with high degree of homogeneity and we provide empirical results on real world cancer datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Sciences of the United States of America 96(6), 2907 (1999)
Article Google Scholar
Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nature Genetics 22, 281–285 (1999)
Article Google Scholar
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95(25), 14863 (1998)
Article Google Scholar
Barkow, S., Bleuler, S., Prelic, A., Zimmermann, P., Zitzler, E.: BicAT: a biclustering analysis toolbox, vol. 22. Oxford Univ. Press (2006)
Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data, pp. 93–103 (2000)
Google Scholar
Cho, H., Dhillon, I.S.: Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 385–400 (2008)
Google Scholar
Wee-Chung Liew, A., Law, N.F., Yan, H.: Recent Patents on Biclustering Algorithms for Gene Expression Data Analysis. Recent Patents on DNA &# 38; Gene Sequences 5(2), 117–125 (2011)
Article Google Scholar
Gu, J., Liu, J.: Bayesian biclustering of gene expression data. BMC Genomics 9(1), S4 (2008)
Article Google Scholar
Prelic, A., et al.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Banerjee, A., Dhillon, I., Ghosh, J., Merugu, S., Modha, D.S.: A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 509–514 (2004)
Google Scholar
Hussain, S.F., Bisson, G., Grimal, C.: An improved co-similarity measure for document clustering. In: Ninth International Conference on Machine Learning and Applications (ICMLA), pp. 190–197 (2010)
Google Scholar
Giannakidou, E., Koutsonikola, V., Vakali, A., Kompatsiaris, Y.: Co-clustering tags and social data sources. In: The Ninth International Conference on Web-Age Information Management, pp. 317–324 (2008)
Google Scholar
Bisson, G., Hussain, F.: Chi-Sim: A New Similarity Measure for the Co-clustering Task. In: International Conference on Machine Learning and Applications, pp. 211–217 (2008)
Google Scholar
Lemaire, B., Denhière, G.: Effects of high-order co-occurrences on word semantic similarities, Arxiv preprint arXiv:0804.0143 (2008)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18(11), 620 (1975)
Article MATH Google Scholar
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–88 (2002)
Article MathSciNet MATH Google Scholar
Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of the National Academy of Sciences 96(12), 6745 (1999)
Article Google Scholar
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531 (1999)
Article Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE Transactions on computational Biology and Bioinformatics, 24–45 (2004)
Google Scholar
Tanay, A., Sharon, R., Shamir, R.: Biclustering gene expression data. In: International Conference on Intelligent Systems for Molecular Biology (2002)
Google Scholar
Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: the order-preserving submatrix problem. Journal of Computational Biology 10, 373–384
Google Scholar

Download references

Author information

Authors and Affiliations

Ghulam Ishaq Khan Institute of Engineering Sciences and Technology, Pakistan
Syed Fawad Hussain

Authors

Syed Fawad Hussain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Jie Tang & Jianyong Wang &
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, SAR, China
Irwin King
Faculty of Engineering and Information Technology, University of Technology, 2007, Sydney, NSW, Australia
Ling Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hussain, S.F. (2011). Bi-clustering Gene Expression Data Using Co-similarity. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-25853-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics