doi:10.1016/S0169-023X(03)00105-8
Copyright © 2003 Elsevier B.V. All rights reserved.
Content-based text querying with ontological descriptors*1
Troels Andreasen
, a, Per Anker Jensen
, b, Jørgen Fischer Nilsson
,
, c, Patrizia Paggio
, d, Bolette Sandford Pedersen
, d and Hanne Erdman Thomsen
, e
a Computer Science, Roskilde University, DK-4000, Roskilde, Denmark
b Business Communication and Information Science, University of Southern Denmark, DK-6000, Kolding, Denmark
c Informatics and Mathematical Modelling, Technical University of Denmark, DK-2800, Lyngby, Denmark
d Centre for Language Technology, DK-2300, Copenhagen, Denmark
e Computational Linguistics, Copenhagen Business School, DK-2000, Frederiksberg, Denmark
Received 6 May 2003;
revised 7 May 2003;
accepted 7 May 2003. ;
Available online 2 July 2003.
References and further reading may be available for this article. To view references and further reading you must
purchase this article.
Abstract
This paper describes a method and a system for content-based querying of texts based on the availability of an ontology for the concepts in the text domain. A key principle in the system is the extraction of conceptual content of noun phrases into descriptors forming an integral part of the ontology.
The retrieval of text passages rests on matching descriptors from the text against descriptors from the noun phrases in the query. The match does not need to be exact but is mediated by the ontology, invoking in particular taxonomic reasoning with sub- and super-concepts. The paper also reports on a prototype implementation of the system.
Author Keywords: Content-based querying; Ontologies; Taxonomic reasoning; Noun phrase semantics; Conceptual distance; Information retrieval
Fig. 1. Concepts placed in the SIMPLE top-ontology.
Fig. 2. Semantic relations.
Fig. 3. A conceptual generalisation over domain-specific analyses.
Fig. 4. System resources––a database containing text objects (documents) and a knowledge base comprising knowledge about the domain of the texts connected through descriptions.
Fig. 5. The ontology navigator; showing that
mangelsygdom (deficiency disease) has for instance
sygdom (disease) as super- and
pellagra as sub-concept and further that e.g.
diabetes (diabetes) is a sibling and that the only two paths to the top of the ontology are
mangelsygdom →
sygdom →
disease →
agentive →
top and
mangelsygdom →
sygdom →
disease →
phenomenon →
event →
entity →
top.
Fig. 6. Prototype architecture––the process of generating descriptions.
Fig. 7. The querying tool of the OQ prototype showing the answer to a query. Notice that the first object in the answer matches the query but not fully since
sygdom is a second-level super-concept to
pellagra (see
Fig. 5) and
(ensidig, mangelfuld, kost) is considered a sub-concept of
(ensidig, kost).