Skip to main content

Print Processing in Contentus: Restoration of Digitized Print Media

  • Chapter
  • First Online:
Towards the Internet of Services: The THESEUS Research Program

Part of the book series: Cognitive Technologies ((COGTECH))

  • 1407 Accesses

Abstract

One of the main goals of the Contentus use case was to manage and improve the technical quality of large digital multimedia collections in cultural heritage organizations. Generally, there are two causes for quality impairment of digitized multimedia items: errors during the digitization process and a poor condition of the analog original. While digitization errors may be corrected by re-digitization, any deterioration of analog materials can only be counteracted by digital restoration in post-processing after digitization. This article showcases a unique technique developed in Contentus to restore digitized hectograph archive documents that typically display yellowed paper and faded printing ink. The documents used in this restoration showcase belong to the archive of the Music Information Center the Association of Composers and Musicologists (MIZ) of the former German Democratic Republic (GDR), and were produced between 1960 and 1989. The hectography method was widely adopted in the GDR to copy documents at a large scale. The showcased restoration method enhances the readability of on-screen texts and, as shown by evaluation, lowers the error rate of optical character recognition. In turn, the latter improvement is expected to improve the automated extraction of semantic information entities like persons, places and organizations. The technology presented in this article is an example of how corpora consisting of visually impaired analog media can be prepared for semantic search applications based on automatic content indexing – another major goal of the use case Contentus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.impact-project.eu

  2. 2.

    http://finereader.abbyy.com

  3. 3.

    http://code.google.com/p/tesseract-ocr

References

  • K. Bossert, N. Flores-Herr, J. Hannemann, CONTENTUS – Technologien für digitale Bibliotheken der nächsten Generation. Dialog mit Bibl. 21(1), 14–20 (2009)

    Google Scholar 

  • F. Chang, C. Chen, C. Lu, A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93(2), 206–220 (2004)

    Article  Google Scholar 

  • A. Cichocki, S. Amari, Adaptive Blind Signal and Image Processing, 1st edn. (Wiley, Hoboken, 2002)

    Book  Google Scholar 

  • M. Dillencourt, H. Sammet, M. Tamminen, A general approach to connected-component labeling for arbitrary image representations. J. ACM (JACM) 39(2), 253–280 (1992)

    Google Scholar 

  • M. Drew, S. Bergner, Spatio-chromatic decorrelation for color image compression. Image Commun. 23(8), 599–609 (2008)

    Google Scholar 

  • G. Dunteman, Principal Components Analysis. Volume 69 of Quantitative Applications in the Social Sciences, 1st edn. (SAGE Publications, Thousand Oaks, 1989)

    Google Scholar 

  • N. Flores-Herr, S. Eickeler, J. Nandzik, S. Paal, I. Konya, H. Sack, CONTENTUS – next generation multimedia library, in Internet der Dienste, ed. by L. Heuser, W. Wahlster (Springer, Berlin/Heidelberg/New York, 2011a), pp. 67–88

    Chapter  Google Scholar 

  • N. Flores-Herr, H. Sack, K. Bossert, Suche in Multimediaarchiven von Kultureinrichtungen, in Handbuch Internet-Suchmaschinen 2 – Neue Entwicklungen in der Websuche, 1st edn., ed. by D. Lewandowski (Akademische Verlagsgesellschaft AKA GmbH, Heidelberg, 2011b), pp. 113–140

    Google Scholar 

  • B. Gatos, K. Ntirogiannis, I. Pratikakis, DIBCO 2009: document image binarization contest. Int. J. Doc. Anal. Recognit. (IJDAR) 14(1), 35–44 (2011)

    Google Scholar 

  • R. Gonzalez, R. Woods, Digital Image Processing, 2nd edn. (Prentice Hall International, Upper Saddle River, 2001)

    Google Scholar 

  • A. Hyvaerinen, Fast and robust fixed-point algorithms for independent component analysis. Neural Netw. 10(3), 626–634 (1999)

    Article  Google Scholar 

  • A. Hyvaerinen, J. Karhunen, E. Oja, Independent Component Analysis (Wiley, Hoboken, 2001)

    Book  Google Scholar 

  • I. Jolliffe, Principal Component Analysis. Springer Series in Statistics, 2nd edn. (Springer, Berlin/Heidelberg/New York, 2002)

    Google Scholar 

  • V. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl. 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  • J. Nandzik, B. Litz, A. Löhden, A. Heß, I. Konya, D. Baum, A. Bergholz, D. Schönfuß, C. Fey, J. Osterhoff, J. Waitelonis, H. Sack, R. Köhler, P. Ndjiki-Nya, CONTENTUS – technologies for next generation multimedia libraries. Multimed. Tools Appl. 63(2), 287–329 (2013)

    Article  Google Scholar 

  • Y. Ohta, T. Kanade, T. Sakai, Color information for region segmentation. Comput. Graph. Image Process. 13(3), 222–241 (1980)

    Article  Google Scholar 

  • N. Otsu, A threshold selection method from gray-level histograms. Syst. Man Cybern. 9(1), 62–66 (1989)

    Google Scholar 

  • J. Sauvola, M. Pietikainen, Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)

    Article  Google Scholar 

  • A. Tonazzini, L. Bedini, E. Salerno, Independent component analysis for document restoration. Doc. Anal. Recognit. 7(1), 17–27 (2004), http://dblp.uni-trier.de/db/journals/ijdar/ijdar7.html#TonazziniBS04

  • O. Trier, A. Jain, Goal-directed evaluation of binarization methods. Pattern Anal. Mach. Intell. 17(12), 1191–1201 (1995)

    Article  Google Scholar 

  • R. Wallor, Ein Ansatz zur ontologiebasierten Wissensrepräsentation. Am Beispiel des Musikinformationszentrums des Verbandes der Komponisten und Musikwissenschaftler der DDR. Master’s thesis, Humboldt-Universität, Berlin, 2012

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iuliu Konya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Konya, I., Eickeler, S., Nandzik, J., Flores-Herr, N. (2014). Print Processing in Contentus: Restoration of Digitized Print Media. In: Wahlster, W., Grallert, HJ., Wess, S., Friedrich, H., Widenka, T. (eds) Towards the Internet of Services: The THESEUS Research Program. Cognitive Technologies. Springer, Cham. https://doi.org/10.1007/978-3-319-06755-1_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06755-1_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06754-4

  • Online ISBN: 978-3-319-06755-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics