The semantic-document approach to combining documents and ontologies
Introduction
Ontologies provide a framework for conceptualization and knowledge modeling in a multitude of areas (Gruber, 1993). In the last decade, ontologies have emerged as one of the most popular modeling approaches for taxonomies, classifications, and other structures used in intelligent systems. Languages, such as resource description framework (RDF), RDF schema (RDFS), and ontology web language (OWL), are the foundation of semantic-web efforts to use ontologies for web services (World Wide Web Consortium, 2004a, World Wide Web Consortium, 2004b, World Wide Web Consortium, 2004c). Furthermore, development environments, such as Protégé (Gennari et al., 2003), provide several tools for building ontologies. Despite this progress, however, there are many areas in which ontologies have not yet reached their full potential in terms of utility and applications, such as integration with other types of personal and organizational information systems. A significant bottleneck is the lack of integration with other forms of knowledge expression. In particular, ontologies must coexist with written definitions and descriptions to ensure, for example, traceability, appropriate documentation, and justification of expressions in the ontology.
Historically, documents have been the key carrier of human knowledge and they continue to be an important medium in the age of electronic communication and the world-wide web. Like ontologies, a major role of documentation is to describe concepts, ideas, and phenomena and their relationships. Each day, millions of people use computers as enhanced typewriters to produce documents, and the number of authors well exceed the number of ontology developers.
It is easy to forget the significance of documents when developing ontologies. Unfortunately, there is a surprisingly large gap between the knowledge modeled in ontologies and the text documenting the same knowledge. In general, authors produce a document for certain purposes, such as communicating ideas and instructions to humans, whereas ontology developers define ontologies for other purposes, such as automated classification and reasoning (Uschold and Jasper, 1999). Current approaches make it difficult to use ontologies and documents in concert and as two views of the same knowledge. The tools available tend to support either ontology editing or document manipulation. For example, it is not difficult to relate classes and individuals1 in an ontology to sections, regions, paragraphs, words, and so forth in a document.
It is possible to distinguish between two major areas where it is useful to integrate documents and ontologies. One possibility is to annotate documents with ontologies to add metalevel information and to provide ontological structures for document content. Here, ontologies describe entire documents and explain document parts, such as words and phrases. The semantic-web approach, for example, aims at supporting annotation of web pages with ontologies (Berners-Lee et al., 2001, Handschuh and Staab, 2003). Another possibility is to document ontologies; that is, to create documentation that describes different aspects of the ontology content and its development. As organizations develop more and more ontologies, the documentation of these ontologies becomes increasingly important. Many applications require a printed version of the knowledge content in a predetermined report format. Furthermore, it is sometimes difficult for domain experts to review ontologies using only computer-based tools for ontology editing and visualization. Printed versions of an ontology complement the alternative perspectives of the content provided by interactive ontology-visualization tools. Moreover, preexisting documents combined with ontologies can support knowledge management (Eriksson and BÅng, 2006). For example, large document repositories can benefit from ontologies that facilitate search and retrieval. Thus, we need representation and communication formats for knowledge that adhere to both human reading and machine processing.
The semantic-document approach attempts to reconcile documents and ontologies by extending printable documents with annotations and additional knowledge bases. The ultimate goal of semantic documents is not merely to provide metadata for documents, such as keywords and Dublin core descriptions (Weibel et al., 1998), but to integrate documentation and knowledge representation to the point where they use a common structure, which provides both documentation and representation views. In our approach, Adobe's portable document format (PDF) (Adobe, 2004a) is the basis for semantic documents, which stores both a printable document and the related knowledge base as a single file. The OWL forms the basis for the ontology representation in this combined format. Alternatively, if description logic is not required, it is possible to use RDF/RDFS (which is a basis for OWL). The Protégé ontology editor, together with our plug-in extensions for supporting PDF, provides an annotation and preparation environment for semantic documents. We have applied this approach to the statistics and clinical-guideline domains and used the tools to annotate existing documents and to generate new semantic documents.
This paper is organized as follows. Section 2 provides the background in terms of ontology and document technologies. Section 3 introduces the semantic-document approach and the basic technology supporting it. Section 4 describes modeling with semantic documents. Section 5 presents applications of semantic documents. Section 6 discusses the pros and cons of the semantic-document approach and its relationship to the semantic web. Finally, Section 7 draws conclusions.
Section snippets
Background
Semantic web and knowledge management are two large areas related to the semantic-document approach. In addition, there is relevant previous work in terms of specific approaches to document-supported knowledge acquisition and support for active documents. Let us discuss these approaches before proceeding with semantic documents.
Semantic documents
Currently, we use PDF as the primary format for implementation of semantic documents. PDF documents are available universally and are used widely for document storage and printing. Many organizations maintain on-line document archives in PDF and make them available on internal networks and on the Internet. The major advantages of PDF are that it is documented and that it is possible to extend the format with additional information. Moreover, there are several commercial and open-source tools
Ontology modeling using semantic documents
Semantic documents introduce new challenges and opportunities for ontologies. There are two aspects that warrant further discussion—the structuring of ontologies for semantic documents and the ontology-development process.
Applications
The semantic-document approach is enabling technology with many potential applications. Just as it is possible to use ontologies for multiple purposes, semantic-document technology is a useful tool that provides a versatile component for building systems and services. Let us first consider some of the possible uses of semantic documents and then proceed by discussing two cases where semantic documents have been used in practice. Examples of potential application areas include:
- •
Ontology
Discussion
The semantic-document approach is a systematic way of combining ontologies with printable documents. Ontologies need appropriate documentation. Conversely, documents need support from ontologies to help explain document content and to facilitate searches. The combined packaging of documents and ontologies is advantageous in that the semantic documents retain their ontology content throughout electronic communication and archival storage. There are many potential applications of semantic
Conclusion
We argue that ontologies and documents should be linked closely because they are two different but interrelated views of the same body of knowledge. An ontology without appropriate documentation is difficult to understand and review. Likewise, a document without sufficient metalevel descriptions is not equipped for automated reasoning and search. The semantic-document approach described here fuses the widely used PDF format with the popular ontology formats RDF/RDFS and OWL. Appropriate tool
Acknowledgments
Thanks are due to Bo Sundgren, Alf Fyhrlund, and Bert Fridlund at Statistics Sweden and Samson Tu at Stanford Medical Informatics for providing the documents and knowledge used in the applications described. Feiyu Lin implemented the first prototype version of the ontology-based search engine for statistics reports. The author wishes to thank Magnus Bång for valuable discussions. This work was supported in part by VINNOVA under Grant no. 24478-1 and in part by Statistics Sweden.
References (53)
- et al.
Task modeling with reusable problem-solving methods
Artificial Intelligence
(1995) - et al.
The evolution of Protégé: an environment for knowledge-based systems development
International Journal of Human–Computer Studies
(2003) A translation approach to portable ontology specification
Knowledge Acquisition
(1993)- Adobe, 1990. PostScript Language Reference Manual, second ed. Addison-Wesley Professional, Boston,...
- Adobe, 2004a. PDF Reference Version 1.6, fifth ed. Adobe Press, Berkeley,...
- Adobe, 2004b. XMP Specification. Adobe Systems...
- Benjamins, V.R., Contreras, J., Blázquez, M., Niño, M., García, A., Navas, E., Rodríguez, J., Wert, C., Millán, R.,...
- et al.
The semantic web
Scientific American
(2001) - Buswell, S., Caprotti, O., Carlisle, D.P., Dewar, M.C., Gaëtano, M., Kohlhase, M. (Eds.), 2004. The OpenMath Standard,...
- Carr, L., Miles-Board, T., Wills, G., Woukeu, A., Hall, W., 2004a. Towards a knowledge-aware office environment. In:...
Swoogle: a search and metadata engine for the semantic web
Ontologies: A Silver Bullet for Knowledge Management and Electronic Commerce
Documents as expert systems
Embedding formal knowledge models in active documents
Communications of the ACM
Formal ontology and information systems
Cited by (49)
Semantic interoperability with heterogeneous information systems on the internet through automatic tabular document exchange
2017, Information SystemsCitation Excerpt :Finally, the queried results are used to reify a tabular document template to create a new tabular document as a feedback, shown in Fig. 12. Since [26,27,31,32] have system implementation, we discuss the selected example behavior under the techniques used in these papers. For semantic document creation, [31,32] allow document writers to create document in any human-oriented form.
A survey on socio-semantic information retrieval
2013, Computer Science ReviewCitation Excerpt :Semantic annotations are assigned a weight that reflects how relevant the corresponding instance or concept is considered to be for the document meaning. Another approach is taken in the case of Semantic Document Retrieval, in which the document itself is modeled as an ontology and the retrieval task is reduced to an ontology mapping between the query ontology and the document ontology [41,42]. In the following, we will present approaches for indexing based on our definition of three possible types of semantics, namely indexing based on content semantics, formal semantics as well as social semantics.
Semantic Metadata Integration Support Method for Editable Re-flowable Document OOXML and Fixed-layout Document PDF
2023, ACM International Conference Proceeding SeriesSemantic Annotation of Office Documents
2022, CEUR Workshop ProceedingsCross-context semantic document exchange through a novel tabular document representation approach
2021, Journal of Information Science and Engineering