Iterative Multi-mode Discretization: Applications to Co-clustering

Fanaee-T, Hadi; Thoresen, Magne

doi:10.1007/978-3-030-61527-7_7

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12323))

Included in the following conference series:

International Conference on Discovery Science

1403 Accesses
3 Altmetric

Abstract

We introduce a new concept called “Iterative Multi-Mode Discretization (IMMD)” which is a new type of efficient data sparsification that can scale up many tasks in data mining. In this paper we demonstrate the application of IMMD in co-clustering, i.e. simultaneous clustering of the rows and columns in a matrix. We propose IMMD-CC, a novel co-clustering algorithm, which is developed based on IMMD. IMMD-CC has attractive properties. First, its time complexity is linear, so it can be used in large-scale problems. In addition, IMMD-CC is able to estimate the number of co-clusters automatically, and more accurate than state-of-the-art methods. We demonstrate the performance of IMMD-CC in comparison to several state-of-the-art methods on 100 data sets from a benchmark cohort, as well as 35 real-world datasets. The results show the promising potential of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Charrad, M., Ben Ahmed, M.: Simultaneous clustering: a survey. In: Kuznetsov, S.O., Mandal, D.P., Kundu, M.K., Pal, S.K. (eds.) PReMI 2011. LNCS, vol. 6744, pp. 370–375. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21786-9_60
Chapter Google Scholar
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Google Scholar
Govaert, G., Nadif, M.: Co-clustering: Models, Algorithms and Applications. Wiley, Hoboken (2013)
Book Google Scholar
Hochreiter, S., et al.: FABIA: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)
Article Google Scholar
Horta, D., Campello, R.J.: Similarity measures for comparing biclusterings. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(5), 942–954 (2014)
Article Google Scholar
Huang, S.Y., Sun, H.J., Huang, C.D., Chung, I.F., Su, C.H.: A modified fuzzy co-clustering (MFCC) approach for microarray data analysis. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 267–272. IEEE (2014)
Google Scholar
Gupta, J.K., Singh, S., Verma, N.K.: MTBA: MATLAB toolbox for biclustering analysis, pp. 94–97. IEEE (2013)
Google Scholar
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)
Article Google Scholar
Li, G., Ma, Q., Tang, H., Paterson, A.H., Xu, Y.: QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res. 37(15), e101–e101 (2009)
Article Google Scholar
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)
Article Google Scholar
Mounir, M., Hamdy, M.: On biclustering of gene expression data. In: 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 641–648. IEEE (2015)
Google Scholar
Orzechowski, P., Boryczko, K., Moore, J.H.: Scalable biclustering—The future of big data exploration? GigaScience 8(7), giz078 (2019)
Article Google Scholar
Padilha, V.A., Campello, R.J.: A systematic comparative evaluation of biclustering techniques. BMC Bioinform. 18(1), 55 (2017)
Article Google Scholar
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)
Article Google Scholar
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Article Google Scholar
Prelić, A., et al.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Article Google Scholar
Saber, H.B., Elloumi, M.: DNA microarray data analysis: a new survey on biclustering. Int. J. Comput. Biol. (IJCB) 4(1), 21–37 (2015)
Article Google Scholar
Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B., et al.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3(3), 985–1012 (2009)
Article MathSciNet Google Scholar
de Souto, M.C., Costa, I.G., de Araujo, D.S., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 497 (2008)
Article Google Scholar
Turner, H., Bailey, T., Krzanowski, W.: Improved biclustering of microarray data demonstrated through systematic performance tests. Comput. Stat. Data Anal. 48(2), 235–254 (2005)
Article MathSciNet Google Scholar
Xie, J., Ma, A., Fennell, A., Ma, Q., Zhao, J.: It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief. Bioinform. 1, 16 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Applied Intelligent Systems Research, Halmstad University, Halmstad, Sweden
Hadi Fanaee-T
Department of Biostatistics, University of Oslo, Oslo, Norway
Magne Thoresen

Authors

Hadi Fanaee-T
View author publications
You can also search for this author in PubMed Google Scholar
Magne Thoresen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hadi Fanaee-T .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
Aristotle University of Thessaloniki, Thessaloniki, Greece
Grigorios Tsoumakas
Open University of Cyprus, Nicosia, Cyprus
Yannis Manolopoulos
Dalhousie University, Halifax, NS, Canada
Stan Matwin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fanaee-T, H., Thoresen, M. (2020). Iterative Multi-mode Discretization: Applications to Co-clustering. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-61527-7_7
Published: 15 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61526-0
Online ISBN: 978-3-030-61527-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics