Paper
4 February 2013 Rule-based versus training-based extraction of index terms from business documents: how to combine the results
Daniel Schuster, Marcel Hanke, Klemens Muthmann, Daniel Esser
Author Affiliations +
Proceedings Volume 8658, Document Recognition and Retrieval XX; 865813 (2013) https://doi.org/10.1117/12.2002509
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States
Abstract
Current systems for automatic extraction of index terms from business documents either take a rule-based or training-based approach. As both approaches have their advantages and disadvantages it seems natural to combine both methods to get the best of both worlds. We present a combination method with the steps selection, normalization, and combination based on comparable scores produced during extraction. Furthermore, novel evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000 scanned business documents.
© (2013) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Daniel Schuster, Marcel Hanke, Klemens Muthmann, and Daniel Esser "Rule-based versus training-based extraction of index terms from business documents: how to combine the results", Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); https://doi.org/10.1117/12.2002509
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Receivers

Feature extraction

Machine learning

Document management

Data modeling

Rule based systems

Chemical elements

Back to Top