Elsevier

Journal of Web Semantics

Volume 2, Issue 1, 1 December 2004, Pages 81-97
Journal of Web Semantics

Peer-to-peer semantic coordination

https://doi.org/10.1016/j.websem.2004.07.004Get rights and content

Abstract

Semantic coordination, namely the problem of finding an agreement on the meaning of heterogeneous schemas, is one of the key issues in the development of the Semantic Web. In this paper, we propose a method for discovering semantic mappings across hierarchical classifications (HCs) based on a new approach, which shifts the problem of semantic coordination from the problem of computing linguistic or structural similarities (what most other proposed approaches do) to the problem of deducing relations between sets of logical formulae that represent the meaning of concepts belonging to different schema. We show how to apply the approach and the algorithm to an interesting family of schemas, namely hierarchical classifications, and present the results of preliminary tests on two types of hierarchical classifications, web directories and catalogs. Finally, we argue why this is a significant improvement on previous approaches.

Introduction

One of the key issues in the development of the Semantic Web is the problem of enabling machines to exchange meaningful information/knowledge across applications which (i) may use autonomously developed schemas (e.g., taxonomies, classifications, database schemas, data types) for organizing locally available data, and (ii) need to discover relations between schemas to achieve their users’ goals. This problem can be viewed as a problem of coordination, defined as follows: (i) all parties have an interest in finding an agreement on how to map their schemas onto each others, but (ii) there are many possible/plausible solutions (many alternative mappings across local schemas) among which they need to select the right, or at least a sufficiently good, one. For this reason, we see this as a problem of semantic coordination. 1

In environments with more or less well-defined boundaries, like a corporate Intranet, the problem of semantic coordination can be addressed a priori by defining and using shared schemas (e.g., ontologies) throughout the entire organization. 2 However, in open environments, like the Semantic Web, this “centralized” approach to semantic coordination is not viable for several reasons, such as the difficulty of “negotiating” a shared model that suits the needs of all parties involved, the practical impossibility of maintaining such a model in a highly dynamic environment, the problem of finding a satisfactory mapping of pre-existing local schemas onto such a global model. In such a scenario, the problem of exchanging meaningful information across locally defined schemas (each possibly presupposing heterogeneous semantic models) seems particularly tough, as we cannot assume an a priori agreement, and therefore its solution requires a more dynamic and flexible form of coordination, which we call “peer-to-peer” semantic coordination.

In this paper, we address an important instance of the problem of peer-to-peer semantic coordination, namely the problem of coordinating hierarchical classifications (HCs). HCs are structures having the explicit purpose of organizing/classifying some kind of data (such as documents, records in a database, goods, activities, services). The problem of coordinating HCs is significant for at least two main reasons:

  • first, HCs are widely used in many applications. 3 Examples are: web directories (see e.g., the Google Directory or the Yahoo! Directory), content management tools and portals (which often use hierarchical classifications to organize documents and web pages), service registry (web services are typically classified in a hierarchical form, e.g., in UDDI), marketplaces (goods are classified in hierarchical catalogs), PC’s file systems (where files are typically classified in hierarchical folder structures);

  • second, it is an empirical fact that most actual HCs (as most concrete instances of models available on the Semantic Web) are built using structures whose labels are expressions from the language spoken by the community of their users (including technical words, neologisms, proper names, abbreviations, acronyms, whose meaning is shared in that community). In our opinion, recognizing this fact is crucial to go beyond the use of syntactic (or weakly semantic) techniques, as it gives us the chance of exploiting the complex degree of semantic coordination implicit in the way a community uses the language from which the labels of a HC are taken.

The main technical contribution of the paper is a logic-based algorithm, called CtxMatch, for coordinating HCs. It takes in input two HCs S and S and, for each pair of concepts mS and nS, returns their semantic relation. The relations we consider in this version of CtxMatch are: m is less general than n, m is more general than n, m is equivalent to n, m is compatible with (possibly overlappings) n, and m is incompatible with (i.e., disjoint from) n. The formal semantics of these relations will be made precise in the paper.

With respect to other approaches to semantic coordination proposed in the literature (often under different “headings”, such as schema matching, ontology mapping, semantic integration; see Section 6 for references and a detailed discussion of some of them), our approach is innovative in three main aspects: (1) we introduce a new method for making explicit the meaning of nodes in a HC (and in general, in structured semantic models) by combining three different types of knowledge, each of which has a specific role; (2) the result of applying this method is that we are able to produce a new representation of a HC, in which all relevant knowledge about the nodes (including their meaning in that specific HC) is encoded as a set of logical formulae; (3) mappings across nodes of two HCs are then deduced via logical reasoning, rather then derived through some more or less complex heuristic procedure, and thus, can be assigned a clearly defined model-theoretic semantics. As we will show, this leads to a major conceptual shift, as the problem of semantic coordination between HCs is no longer tackled as a problem of computing linguistic or structural similarities (possibly with the help of a thesaurus and of other information about the type of arcs between nodes), but rather as a problem of deducing relations between formulae that represent the meaning of each concept in a given HC. This explains, for example, why our approach performs much better than other ones when two concepts are intuitively equivalent, but occur in structurally very different HCs.

The paper goes as follows. In Section 2, we introduce the main conceptual assumptions of the new approach we propose to semantic coordination. In Section 3, we show how this approach is instantiated to the problem of coordinating HCs. Then, we present the main features of CtxMatch, the proposed algorithm for coordinating HCs (Section 4). In the final part of the paper, we sum-up the results of testing the algorithm on web directories and catalogs (Section 5) and compare our approach with other proposed approaches for matching schemas (Section 6).

Section snippets

Our approach

The method we propose assumes that we deal with a network of physically connected entities which can autonomously decide how to organize locally available data; we call these entities semantic peers. Peers organize their data using one or more schemas (e.g., database schemas, directories in a file system, classification schemas, taxonomies, and so on); as we said, in this paper, we focus on classifications. Different peers may use different schemas to classify the same collection of

P2P coordination of hierarchical classifications

In this section, we show how to apply the general approach described in the previous section to the problem of coordinating HCs. Intuitively, a classification is a grouping of things into classes or categories. When categories are arranged into a hierarchical structure, we have a hierarchical classification. Formally, the hierarchical structures we use to build HCs are concept hierarchies, defined as follows in [7]:

Definition 1 Concept hierarchy

A concept hierarchy is a triple S=N,E,l where N is a finite set of nodes, E

The algorithm: CtxMatch

The CtxMatch algorithm (see Algorithm 1) takes as input two HCs (representing the structural knowledge), a lexicon L (representing the lexical knowledge) and an ontology O (representing world knowledge), and returns as output a mapping between the nodes of the two classifications. For the sake of simplicity, in the explanation of the algorithm, we imagine that the two HCs taken as input are the two structures depicted in the lower hand of Fig. 2.

The algorithm has essentially the following two

Testing the algorithm

In this section, we report from [18] some results of the first test on CtxMatch on real HCs (i.e., pre-existing classifications used in real appli- cations).

Related work

CtxMatch shifts the problem of semantic coordination from the problem of matching (in a more or less sophisticated way) semantic structures (e.g., schemas) to the problem of deducing semantic relations between sets of logical formulae. Under this respect, to the best of our knowledge, there are no other works to which we can compare ours.

However, it is important to see how CtxMatch compares with the performance of techniques based on different approaches to semantic coordination. There are four

Conclusions and future work

In this paper, we presented a new approach to semantic coordination in open and distributed environments, and an algorithm (called CtxMatch) that implements this method for hierarchical classifications. The algorithm has already been used in a peer-to-peer application for distributed knowledge management (the application is described in [2]), and is going to be applied in a peer-to-peer wireless system for ambient intelligence [8].

An important lesson we learned from this work is that methods

References (22)

  • C. Ghidini et al.

    Local models semantics, or contextual reasoning = locality + compatibility

    Artificial Intelligence

    (2001)
  • F. Giunchiglia et al.

    Multilanguage hierarchical logics or: how we can do without modal logics

    Artificial Intelligence

    (1994)
  • S. Bergamaschi et al.

    Semantic integration of semistructured and structured data sources

    SIGMOD Record

    (1999)
  • M. Bonifacio et al.

    Kex: a peer-to-peer solution for distributed knowledge management

  • M. Bonifacio et al.

    Enabling distributed knowledge management. managerial and technological implications

    Novatica and Informatik/Informatique

    (2002)
  • A. Borgida et al.

    Distributed description logics: directed domain correspondences in federated information sources

  • P. Bouquet (Ed.), AAAI-02 Workshop on Meaning Negotiation, AAAI, AAAI Press, Edmonton, Canada, July...
  • G.C. Bowker et al.

    Sorting things out: classification and its consequences

    (1999)
  • A. Büchner et al.

    Semantic information mediation among multiple product ontologies

    Proceedings of the Fourth World Conference on Integrated Design & Process Technology

    (1999)
  • P. Busetta, P. Bouquet, G. Adami, M. Bonifacio, F. Palmieri, K-Trek: an approach to context awareness in large...
  • D.S. Day, M.B. Vilain, Phrase parsing with rule sequence processors: an application to the shared CoNLL task, in:...
  • Cited by (0)

    View full text