ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Pattern Recognition
Volume 35, Issue 2, February 2002, Pages 485-503
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (2519 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0031-3203(01)00026-7    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2001 Pattern Recognition Society. Published by Elsevier Science B.V.

Automatic generation of structured hyperdocuments from document images*1

Ji-Yeon Leea, Jeong-Seon Parka, Hyeran Byunb, Jongsub Moonc and Seong-Whan LeeCorresponding Author Contact Information, E-mail The Corresponding Author, a

a Center for Artificial Vision Research, Department of Computer Science and Engineering, Korea University, Anam-Dong, Seongbuk-ku, Seoul 136-701, South Korea b Department of Computer Science, Yonsei University, 134 Shinchon-dong, Seodaemoon-ku, Seoul 120-749, South Korea c Department of Electronics and Information Engineering, Korea University, Chochiwon, Yeongi-kun, Chungnam 339-800, South Korea

Received 13 January 2000;
accepted 28 December 2000
Available online 26 November 2001.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

As sharing documents through the World Wide Web has been recently and constantly increasing, the need for creating hyperdocuments to make them accessible and retrievable via the internet, in formats such as HTML and SGML/XML, has also been rapidly rising. Nevertheless, only a few works have been done on the conversion of paper documents into hyperdocuments. Moreover, most of these studies have concentrated on the direct conversion of single-column document images that include only text and image objects. In this paper, we propose two methods for converting complex multi-column document images into HTML documents, and a method for generating a structured table of contents page based on the logical structure analysis of the document image. Experiments with various kinds of multi-column document images show that, by using the proposed methods, their corresponding HTML documents can be generated in the same visual layout as that of the document images, and their structured table of contents page can be also produced with the hierarchically ordered section titles hyperlinked to the contents.

Author Keywords: Structured hyperdocument; Multi-column document; Document conversion; Document image understanding; Logical structure analysis

Article Outline

1. Introduction
2. Related works
3. HTML conversion of multi-column document images
3.1. An approach based on the table structure
3.1.1. Text object splitting
3.1.2. Object merging
3.1.3. Virtual table generation and object ordering
3.1.4. Table-to-HTML conversion algorithm
3.2. An approach based on the layer structure
3.2.1. Object resizing
3.2.2. Extracting the structural properties of text objects
3.2.3. Automatic tagging
4. Generation of a structured hyperdocument based on logical structure analysis
4.1. Section title candidate extraction
4.2. Section number sorting
4.3. Verification
4.4. Section title arrangement
5. Experimental results and analysis
5.1. Experimental environment
5.2. Experimental results
6. Conclusions and further research
References
Vitae



















Pattern Recognition
Volume 35, Issue 2, February 2002, Pages 485-503
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.