|
1. |
Web Document Parsing: A New Approach to Modeling Layout-Language Relations
Yoshida, M.; Nakagawa, H.;
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Volume 1,
23-26 Sept. 2007
Page(s):203
-
207
Abstract:
We propose a novel approach for extracting semantic structures from Web documents. Our task is to extract trees that describe the hierarchical relations in documents. We developed an algorithm for this task by using the stochastic context free grammar (SCFG) framework. Experiments showed that our approach effectively worked showing performance improvement through the parameter estimation.
|