|
1. |
Visual Segmentation-Based Data Record Extraction from Web Documents
Longzhuang Li; Yonghuai Liu; Obregon, A.;
Information Reuse and Integration, 2007. IRI 2007. IEEE International Conference on
13-15 Aug. 2007
Page(s):502
-
507
Abstract:
Semi-structured data records contained in the Web pages provide useful information for shopping agents and metasearch engines. In this paper, we present a visual segmentation-based data record extraction (VSDR) method to extract data records from those Web pages. VSDR method first segments a Web page into semantic blocks using the spatial closeness and visual resemblance of data records, then neighboring and non-neighboring data records are extracted based on a compress and collapse technique. Experimental results slum that unlike the existing methods which only generate good results on their test domains, VSDR is a general data record extraction method that is able to produce quite stable and good results on a wide range of Web pages.
|