Abstract
Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting information from Web pages using such wrappers, and for translating the extracted content into XML. This paper describes some advanced features of Lixto, such as disjunctive pattern definitions, specialization rules, and Lixto’s capability of collecting and aggregating information from several linked Web pages.
All new methods and algorithms of the Lixto system are covered by a pending patent. Future developments of Lixto will be reported at www.lixto.com.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Abiteboul, P. Buneman, and D. Suciu. Data on the Web-From Relations to Semistructured Data and XML. Morgan Kaufmann, 2000. 21
B. Adelberg. NoDoSE-a tool for semi-automatically extracting semi-structured data from text documents. In Proc. of SIGMOD, 1998. 21
P. Atzeni and G. Mecca. Cut and paste. In Proc. of PODS, 1997. 21
R. Baumgartner, S. Flesca, and G. Gottlob. Supervised wrapper generation with Lixto. To appear in Proc. of VLDB (Demonstration Session), 2001. 22
R. Baumgartner, S. Flesca, and G. Gottlob. Visual web information extraction with lixto. To appear in Proc. of VLDB, 2001. 21, 22, 23, 28
S. Chawathe. Describing and manipulating XML data. Bulletin of the IEEE Technical Committee on Data Engineering, 22(3):3–9, 1999. Invited paper. 21
H. Davulcu, G. Yang, M. Kifer, and I. V. Ramakrishnan. Computat. aspects of resilient data extract. from semistr. sources. In Proc. of PODS, 2000. 21
C-N. Hsu and M. T. Dung. Generating finite-state transducers for semistructured data extraction from the web. Information Systems, 23/8, 1998. 21
G. Huck, P. Fankhauser, K. Aberer, and E. J. Neuhold. JEDI: Extracting and synthesizing information from the web. In Proc. of COOPIS, 1998. 21
N. Kushmerick, D. Weld, and R. Doorenbos. Wrapper induction for information extraction. In Proc. of IJCAI, 1997. 21
L. Liu, C. Pu, and W. Han. XWrap: An extensible wrapper construction system for internet information. In Proc. of ICDE, 2000. 21
W. May, R. Himmeröder, G. Lausen, and B. Ludäscher. A unified framework for wrapping, mediating and restructuring information from the web. In WWWCM. Sprg. LNCS 1727, 1999. 21
I. Muslea, S. Minton, and C. Knoblock. A hierarchical approach to wrapper induction. In Proc. of 3rd Intern. Conf. on Autonomous Agents, 1999. 21
A. Sahuguet and F. Azavant. Building light-weight wrappers for legacy web datasources using W4F. In Proc. of VLDB, 1999. 21
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baumgartner, R., Flesca, S., Gottlob, G. (2001). Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto . In: Eiter, T., Faber, W., Truszczyński, M.l. (eds) Logic Programming and Nonmotonic Reasoning. LPNMR 2001. Lecture Notes in Computer Science(), vol 2173. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45402-0_2
Download citation
DOI: https://doi.org/10.1007/3-540-45402-0_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42593-9
Online ISBN: 978-3-540-45402-1
eBook Packages: Springer Book Archive