ABSTRACT
Hierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and hyperlinks. In this work, we propose an algorithm for extracting a Website's topic hierarchy from its link structure. The proposed algorithm consists of a construction stage and a refining stage, in which we analyze the semantic relationships between web pages based on link structure, web page content and directory structure. We've done extensive experiments using different Websites and obtained very promising results.
- W.S. Li, O Kolak, Q. Vu and H. Takano. Defining Logical Domains in a Website. Proc. of 11th ACM Conf. on Hypertext and Hypermedia, San Antonio, 2000 Google ScholarDigital Library
- Z. Chen, S. Liu, W. Liu, G. Pu and W.Y. Ma. Building a Web Thesaurus from Web Link Structure. In Proc. of the 25th ACM SIGIR Conference, Finland, 2002 Google ScholarDigital Library
- N. Liu and C. C. Yang. Mining Web Site's Topic Hierarchy. In Proc. of International World Wide Web Conference, Tokyo, Japan, 2005. Google ScholarDigital Library
Index Terms
- Extracting a website's content structure from its link structure
Recommendations
A link classification based approach to website topic hierarchy generation
WWW '07: Proceedings of the 16th international conference on World Wide WebHierarchical models are commonly used to organize a Website's content. A Website's content structure can be represented by a topic hierarchy, a directed tree rooted at a Website's homepage in which the vertices and edges correspond to Web pages and ...
Mining web site's topic hierarchy
WWW '05: Special interest tracks and posters of the 14th international conference on World Wide WebSearching and navigating a Web site is a tedious task and the hierarchical models, such as site maps, are frequently used for organizing the Web site's content. In this work, we propose to model a Web site's content structure using the topic hierarchy, ...
A Dynamic Reconstruction Approach to Topic Summarization of User-Generated-Content
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementUser generated contents (UGCs) from various social media sites give analysts the opportunity to obtain a comprehensive and dynamic view of any topic from multiple heterogeneous information sources. Summarization provides a promising means of distilling ...
Comments