Copyright © 2006 The Institute of Electronics, Information and Communication Engineers
Regular Section -- Papers -- Data Mining |
Mining Communities on the Web Using a Max-Flow and a Site-Oriented Framework
1 The author is with the Department of Information Sciences, Faculty of Science and Engineering, Tokyo Denki University, Saitama-ken, 3500394 Japan. E-mail: asano{at}y.dendai.ac.jp, 2 The author is with the Graduate School of Information Sciences, Tohoku University, Sendai-shi, 9808579 Japan., 3 The authors are with the Institute of Industrial Science, The University of Tokyo, Tokyo, 1538505 Japan.
There are several methods for mining communities on the Web using hyperlinks. One of the well-known ones is a max-flow based method proposed by Flake et al. The method adopts a page-oriented framework, that is, it uses a page on the Web as a unit of information, like other methods including HITS and trawling. Recently, Asano et al. built a site-oriented framework which uses a site as a unit of information, and they experimentally showed that trawling on the site-oriented framework often outputs significantly better communities than trawling on the page-oriented framework. However, it has not been known whether the site-oriented framework is effective in mining communities through the max-flow based method. In this paper, we first point out several problems of the max-flow based method, mainly owing to the page-oriented framework, and then propose solutions to the problems by utilizing several advantages of the site-oriented framework. Computational experiments reveal that our max-flow based method on the site-oriented framework is very effective in mining communities, related to the topics of given pages, in comparison with the original max-flow based method on the page-oriented framework.
Key Words: Web, data mining, site, max-flow, site-oriented framework
Manuscript received August 8, 2005. Manuscript revised February 26, 2006.