loading
Papers Papers/2022 Papers Papers/2022

Research.Publish.Connect.

Paper

Paper Unlock

Authors: Costin-Gabriel Chiru ; Andrei Hanganu ; Traian Rebedea and Stefan Trausan-Matu

Affiliation: “Politehnica” University of Bucharest, Romania

Keyword(s): Text Recovery, OCR, Natural Language Processing, Probabilistic Parsing, N-grams.

Related Ontology Subjects/Areas/Topics: Applications ; Artificial Intelligence ; Computational Intelligence ; Evolutionary Computing ; Knowledge Discovery and Information Retrieval ; Knowledge Engineering and Ontology Development ; Knowledge-Based Systems ; Machine Learning ; Natural Language Processing ; Pattern Recognition ; Soft Computing ; Symbolic Systems

Abstract: In this paper we present a text recovery method based on a probabilistic post-recognition processing of the output of an Optical Character Recognition system. The proposed method is trying to fill in the gaps of missing text resulted from the recognition process of degraded documents. For this task, a corpus of up to 5-grams provided by Google is used. Several heuristics for using this corpus for the fulfilment of this task are described after presenting the general problem and alternative solutions. These heuristics have been validated using a set of experiments that are also discussed together with the results that have been obtained.

CC BY-NC-ND 4.0

Sign In Guest: Register as new SciTePress user now for free.

Sign In SciTePress user: please login.

PDF ImageMy Papers

You are not signed in, therefore limits apply to your IP address 18.191.165.86

In the current month:
Recent papers: 100 available of 100 total
2+ years older papers: 200 available of 200 total

Paper citation in several formats:
Chiru, C.; Hanganu, A.; Rebedea, T. and Trausan-Matu, S. (2010). FILLING THE GAPS USING GOOGLE 5-GRAMS CORPUS. In Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT; ISBN 978-989-8425-23-2; ISSN 2184-2833, SciTePress, pages 438-443. DOI: 10.5220/0002932204380443

@conference{icsoft10,
author={Costin{-}Gabriel Chiru. and Andrei Hanganu. and Traian Rebedea. and Stefan Trausan{-}Matu.},
title={FILLING THE GAPS USING GOOGLE 5-GRAMS CORPUS},
booktitle={Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT},
year={2010},
pages={438-443},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0002932204380443},
isbn={978-989-8425-23-2},
issn={2184-2833},
}

TY - CONF

JO - Proceedings of the 5th International Conference on Software and Data Technologies - Volume 2: ICSOFT
TI - FILLING THE GAPS USING GOOGLE 5-GRAMS CORPUS
SN - 978-989-8425-23-2
IS - 2184-2833
AU - Chiru, C.
AU - Hanganu, A.
AU - Rebedea, T.
AU - Trausan-Matu, S.
PY - 2010
SP - 438
EP - 443
DO - 10.5220/0002932204380443
PB - SciTePress