ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
Computer Networks
Volume 31, Issues 11-16, 17 May 1999, Pages 1495-1507
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (96 K)

Article Toolbox
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S1389-1286(99)00048-1    
How to Cite or Link Using DOI (Opens New Window)

Copyright © 1999 Published by Elsevier Science B.V. All rights reserved.

KPS: a Web information mining algorithm

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Tao Guana, * and Kam-Fai Wong1, b

a Department of Computer Science, University of Regina, Regina, Sask, Canada S4S 0A2

b Department of System Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China


Available online 3 May 2000.

Abstract

The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Current practices address this problem by (1) syntax analysis (i.e. HTML tags); or (2) wrappers or user-defined declarative languages. The former is only suitable for highly structured Web sites and the latter is time-consuming and offers low scalability. Wrappers could handle tens, but certainly not thousands, of information sources. In this paper, we present a novel information mining algorithm, namely KPS, over semi-structured information on the Web. KPS employs keywords, patterns and/or samples to mine the desired information. Experimental results show that KPS is more efficient than existing Web extracting methods.

Author Keywords: Information extraction; Information retrieval; Web query; Web databases

Article Outline

1. Introduction
1.1. Related work
2. Keyword-based mining
3. Pattern-based mining
4. Sample-based mining
4.1. The formal model
4.2. Pattern similarity
4.3. Style similarity
4.4. The algorithm
5. Experimental results
6. Conclusion and further works
Acknowledgements
Appendix
References
Vitae


*E-mail: guan@cs.uregina.ca

1E-mail: kfwong@se.cuhk.edu.hk


Computer Networks
Volume 31, Issues 11-16, 17 May 1999, Pages 1495-1507
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.