Abstract
We introduce a new concept called “Iterative Multi-Mode Discretization (IMMD)” which is a new type of efficient data sparsification that can scale up many tasks in data mining. In this paper we demonstrate the application of IMMD in co-clustering, i.e. simultaneous clustering of the rows and columns in a matrix. We propose IMMD-CC, a novel co-clustering algorithm, which is developed based on IMMD. IMMD-CC has attractive properties. First, its time complexity is linear, so it can be used in large-scale problems. In addition, IMMD-CC is able to estimate the number of co-clusters automatically, and more accurate than state-of-the-art methods. We demonstrate the performance of IMMD-CC in comparison to several state-of-the-art methods on 100 data sets from a benchmark cohort, as well as 35 real-world datasets. The results show the promising potential of the proposed method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Charrad, M., Ben Ahmed, M.: Simultaneous clustering: a survey. In: Kuznetsov, S.O., Mandal, D.P., Kundu, M.K., Pal, S.K. (eds.) PReMI 2011. LNCS, vol. 6744, pp. 370–375. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21786-9_60
Cheng, Y., Church, G.M.: Biclustering of expression data. In: ISMB, vol. 8, pp. 93–103 (2000)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretic co-clustering. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2003)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD 1996, pp. 226–231. AAAI Press (1996)
Govaert, G., Nadif, M.: Co-clustering: Models, Algorithms and Applications. Wiley, Hoboken (2013)
Hochreiter, S., et al.: FABIA: factor analysis for bicluster acquisition. Bioinformatics 26(12), 1520–1527 (2010)
Horta, D., Campello, R.J.: Similarity measures for comparing biclusterings. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(5), 942–954 (2014)
Huang, S.Y., Sun, H.J., Huang, C.D., Chung, I.F., Su, C.H.: A modified fuzzy co-clustering (MFCC) approach for microarray data analysis. In: 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 267–272. IEEE (2014)
Gupta, J.K., Singh, S., Verma, N.K.: MTBA: MATLAB toolbox for biclustering analysis, pp. 94–97. IEEE (2013)
Kluger, Y., Basri, R., Chang, J.T., Gerstein, M.: Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res. 13(4), 703–716 (2003)
Li, G., Ma, Q., Tang, H., Paterson, A.H., Xu, Y.: QUBIC: a qualitative biclustering algorithm for analyses of gene expression data. Nucleic Acids Res. 37(15), e101–e101 (2009)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 1(1), 24–45 (2004)
Mounir, M., Hamdy, M.: On biclustering of gene expression data. In: 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS), pp. 641–648. IEEE (2015)
Orzechowski, P., Boryczko, K., Moore, J.H.: Scalable biclustering—The future of big data exploration? GigaScience 8(7), giz078 (2019)
Padilha, V.A., Campello, R.J.: A systematic comparative evaluation of biclustering techniques. BMC Bioinform. 18(1), 55 (2017)
Patrikainen, A., Meila, M.: Comparing subspace clusterings. IEEE Trans. Knowl. Data Eng. 18(7), 902–916 (2006)
Pontes, B., Giráldez, R., Aguilar-Ruiz, J.S.: Biclustering on expression data: a review. J. Biomed. Inform. 57, 163–180 (2015)
Prelić, A., et al.: A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics 22(9), 1122–1129 (2006)
Saber, H.B., Elloumi, M.: DNA microarray data analysis: a new survey on biclustering. Int. J. Comput. Biol. (IJCB) 4(1), 21–37 (2015)
Shabalin, A.A., Weigman, V.J., Perou, C.M., Nobel, A.B., et al.: Finding large average submatrices in high dimensional data. Ann. Appl. Stat. 3(3), 985–1012 (2009)
de Souto, M.C., Costa, I.G., de Araujo, D.S., Ludermir, T.B., Schliep, A.: Clustering cancer gene expression data: a comparative study. BMC Bioinform. 9(1), 497 (2008)
Turner, H., Bailey, T., Krzanowski, W.: Improved biclustering of microarray data demonstrated through systematic performance tests. Comput. Stat. Data Anal. 48(2), 235–254 (2005)
Xie, J., Ma, A., Fennell, A., Ma, Q., Zhao, J.: It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data. Brief. Bioinform. 1, 16 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fanaee-T, H., Thoresen, M. (2020). Iterative Multi-mode Discretization: Applications to Co-clustering. In: Appice, A., Tsoumakas, G., Manolopoulos, Y., Matwin, S. (eds) Discovery Science. DS 2020. Lecture Notes in Computer Science(), vol 12323. Springer, Cham. https://doi.org/10.1007/978-3-030-61527-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-61527-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61526-0
Online ISBN: 978-3-030-61527-7
eBook Packages: Computer ScienceComputer Science (R0)