Rule-based versus training-based extraction of index terms from business documents: how to combine the results

Daniel Schuster; Marcel Hanke; Klemens Muthmann; Daniel Esser

doi:10.1117/12.2002509

4 February 2013 Rule-based versus training-based extraction of index terms from business documents: how to combine the results

Daniel Schuster, Marcel Hanke, Klemens Muthmann, Daniel Esser

Author Affiliations +

Proceedings Volume 8658, Document Recognition and Retrieval XX; 865813 (2013) https://doi.org/10.1117/12.2002509
Event: IS&T/SPIE Electronic Imaging, 2013, Burlingame, California, United States

Abstract

Current systems for automatic extraction of index terms from business documents either take a rule-based or training-based approach. As both approaches have their advantages and disadvantages it seems natural to combine both methods to get the best of both worlds. We present a combination method with the steps selection, normalization, and combination based on comparable scores produced during extraction. Furthermore, novel evaluation metrics are developed to support the assessment of each step in an existing extraction system. Our methods were evaluated on an example extraction system with three individual extractors and a corpus of 12,000 scanned business documents.

Citation Download Citation

Daniel Schuster, Marcel Hanke, Klemens Muthmann, and Daniel Esser "Rule-based versus training-based extraction of index terms from business documents: how to combine the results", Proc. SPIE 8658, Document Recognition and Retrieval XX, 865813 (4 February 2013); https://doi.org/10.1117/12.2002509

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available