ABSTRACT
Clustering on multi-type relational data has attracted more and more attention in recent years due to its high impact on various important applications, such as Web mining, e-commerce and bioinformatics. However, the research on general multi-type relational data clustering is still limited and preliminary. The contribution of the paper is three-fold. First, we propose a general model, the collective factorization on related matrices, for multi-type relational data clustering. The model is applicable to relational data with various structures. Second, under this model, we derive a novel algorithm, the spectral relational clustering, to cluster multi-type interrelated data objects simultaneously. The algorithm iteratively embeds each type of data objects into low dimensional spaces and benefits from the interactions among the hidden structures of different types of data objects. Extensive experiments demonstrate the promise and effectiveness of the proposed algorithm. Third, we show that the existing spectral clustering algorithms can be considered as the special cases of the proposed model and algorithm. This demonstrates the good theoretic generality of the proposed model and algorithm.
- Bach, F. R., & Jordan, M. I. (2004). Learning spectral clustering. Advances in Neural Information Processing Systems 16.]]Google Scholar
- Banerjee, A., Dhillon, I. S., Ghosh, J., Merugu, S., & Modha, D. S. (2004). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. KDD (pp. 509--514).]] Google ScholarDigital Library
- Bhatia, R. (1997). Matrix analysis. New York: Springeer-Cerlag.]]Google ScholarCross Ref
- Chan, P. K., Schlag, M. D. F., & Zien, J. Y. (1993). Spectral k-way ratio-cut partitioning and clustering. DAC '93 (pp. 749--754).]] Google ScholarDigital Library
- D. D. Lee, & H. S. Seung (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788--791.]]Google ScholarCross Ref
- Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. KDD (pp. 269--274).]] Google ScholarDigital Library
- Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. KDD'03 (pp. 89--98).]] Google ScholarDigital Library
- Ding, C., He, X., & Simon, H. (2005). On the equivalence of non-negative matrix factorization and spectral clustering. SDM'05.]]Google ScholarCross Ref
- Ding, C. H. Q., & He, X. (2004). Linearized cluster assignment via spectral ordering. ICML.]] Google ScholarDigital Library
- Ding, C. H. Q., He, X., Zha, H., Gu, M., & Simon, H. D. (2001). A min-max cut algorithm for graph partitioning and data clustering. Proceedings of ICDM 2001 (pp. 107--114).]] Google ScholarDigital Library
- El-Yaniv, R., & Souroujon, O. (2001). Iterative double clustering for unsupervised and semi-supervised learning. ECML (pp. 121--132).]] Google ScholarDigital Library
- Gao, B., Liu, T.-Y., Zheng, X., Cheng, Q.-S., & Ma, W.-Y. (2005). Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering. KDD '05 (pp. 41--50).]] Google ScholarDigital Library
- G. Golub, & Loan, C. (1989). Matrix computations. Johns Hopkins University Press.]]Google Scholar
- Hofmann, T. (1999). Probabilistic latent semantic analysis. Proc. of Uncertainty in Artificial Intelligence, UAI'99. Stockholm.]] Google ScholarDigital Library
- Hofmann, T., & Puzicha, J. (1999). Latent class models for collaborative filtering. IJCAI'99. Stockholm.]] Google ScholarDigital Library
- H. Zha, C. Ding, M. X., & H. Simon (2001). Bi-partite graph partitioning and data clustering. ACM CIKM'01.]] Google ScholarDigital Library
- Lang, K. (1995). News weeder: Learning to filter netnews. ICML.]]Google Scholar
- Li, T. (2005). A general model for clustering binary data. KDD'05.]] Google ScholarDigital Library
- Long, B., Zhang, Z. M., & Yu, P. S. (2005). Co-clustering by block value decomposition. KDD'05.]] Google ScholarDigital Library
- Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14.]]Google Scholar
- R. O. Duda, P. E. Hart, & D. G. Stork. (2000). Pattern classification. New York: John Wiley & Sons.]] Google ScholarDigital Library
- Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888--905.]] Google ScholarDigital Library
- Strehl, A., & Ghosh, J. (2002). Cluster ensembles - a knowledge reuse framework for combining partitionings. AAAI 2002 (pp. 93--98). AAAI/MIT Press.]] Google ScholarDigital Library
- Taskar, B., Segal, E., & Koller, D. (2001). Probabilistic classification and clustering in relational data. Proceeding of IJCAI-01.]]Google Scholar
- Tishby, N., Pereira, F., & Bialek, W. (1999). The information bottleneck method. Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing (pp. 368--377).]]Google Scholar
- Wang, J., Zeng, H., Chen, Z., Lu, H., Tao, L., & Ma, W.-Y. (2003). Recom: reinforcement clustering of multi-type interrelated data objects. SIGIR '03 (pp. 274--281).]] Google ScholarDigital Library
- Zeng, H.-J., Chen, Z., & Ma, W.-Y. (2002). A unified framework for clustering heterogeneous web objects. WISE '02 (pp. 161--172).]] Google ScholarDigital Library
- Zha, H., Ding, C., Gu, M., He, X., & Simon, H. (2002). Spectral relaxation for k-means clustering. Advances in Neural Information Processing Systems, 14.]]Google Scholar
Index Terms
- Spectral clustering for multi-type relational data
Recommendations
On cluster tree for nested and multi-density data clustering
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
Comments