To read this content please select one of the options below:

Converting PDF files to XML files

Wende Zhang (Fuzhou University Library, Fuzhou, Fujian, People's Republic of China)

The Electronic Library

ISSN: 0264-0473

Article publication date: 15 February 2008

Downloads

862

Abstract

Purpose

–

The purpose of this paper is to develop a system that can convert PDF files to XML files.

Design/methodology/approach

–

The system works with XML as an information display model and XSLT as an information extraction rule. The process is illustrated by converting a scientific and technological paper in PDF to a valid XML file.

Findings

–

Because the PDF file adopts the self‐descriptive definition, its content information and the display information exists in different objects; therefore, it is not easy to directly extract information from the PDF source file. The undirected way to solve this problem in the system design was to convert the PDF source file to a relatively easy processing intermediate format, which can then be automatically converted to the target file in accordance with relevant rules.

Originality/value

–

It is important to be able to easily and conveniently extract information from PDF files and this paper shows how it can be done. The design ideas contained in the paper can also be applied to information extraction from other types of files.

Keywords

Citation

Zhang, W. (2008), "Converting PDF files to XML files", The Electronic Library, Vol. 26 No. 1, pp. 68-74. https://doi.org/10.1108/02640470810851743

Publisher

:

Emerald Group Publishing Limited

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Converting PDF files to XML files

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

Related articles

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Keywords

Citation

Publisher

Related articles

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information