A graph distance metric based on the maximal common subgraph

https://doi.org/10.1016/S0167-8655(97)00179-7Get rights and content

Abstract

Error-tolerant graph matching is a powerful concept that has various applications in pattern recognition and machine vision. In the present paper, a new distance measure on graphs is proposed. It is based on the maximal common subgraph of two graphs. The new measure is superior to edit distance based measures in that no particular edit operations together with their costs need to be defined. It is formally shown that the new distance measure is a metric. Potential algorithms for the efficient computation of the new measure are discussed.

Introduction

One of the most general and powerful data structures useful in a variety of applications are graphs. For example, in computer vision and pattern recognition, graphs are often used to represent unknown objects, which are to be recognized, and known models, which are stored in a database. Thus, the recognition problem turns into a graph matching problem. Applications of graph matching in pattern recognition and machine vision include character recognition (Lu et al., 1991; Cordella et al., 1997), schematic diagram interpretation (Lee et al., 1990; Messmer and Bunke, 1996), shape analysis (Pearce et al., 1994), image registration (Christmas et al., 1995), 3-D object recognition (Cho and Kim, 1992; Wong, 1992) and video indexing (Shearer et al., 1997).

Classical algorithms of graph matching include graph and subgraph isomorphism (Read and Corneil, 1977; Ullman, 1976). However, due to errors and distortions in the input data and the models, approximate, or error-tolerant, graph matching methods are needed in many applications. One way to cope with errors and distortions is graph edit distance (Shapiro and Haralick, 1981; Bunke, 1997). Here one introduces a set of edit operations, for example, the deletion, insertion and substitution of nodes and edges, and defines the similarity of two graphs in terms of the shortest (or least cost) sequence of edit operations that transforms one graph into the other. Another approach to error-tolerant graph matching is based on the maximal common subgraph of two graphs (Horaud and Skordas, 1989; Levinson, 1992).

When defining distance or similarity measures, certain properties are desirable. For example, one may wish that the distance from object A to B is the same as the distance from B to A (symmetry). Speaking more generally, it is often desired that the distance measure d fulfills the properties of a metric:

  • 1.

    d(A,B)=0⇔A=B,

  • 2.

    d(A,B)=d(B,A),

  • 3.

    d(A,B)+d(B,C)⩽d(A,C).

Usually edit distance measures are metrics. Only if the costs of the underlying edit operations satisfy certain conditions, the properties listed above will hold. But these conditions are sometimes too restrictive, or incompatible with the considered problem domain.

In the present paper, we propose a new graph distance measure that is based on the maximal common subgraph of two graphs. The main contribution of the paper is the formal proof that the new distance measure is a metric. An advantage of the new distance measure over graph edit distance is the fact that it does not depend on edit costs. It is well known that any edit distance measure critically depends on the costs of the underlying edit operations. But the problem how these edit costs are obtained is still unsolved. Using the new distance measure, this problem can be avoided.

In the next section of this paper we will present basis definitions. The following section will first define the maximal common subgraph based distance measure. Then it will be shown that the measure is a metric. Concluding remarks will make up the final section, including a discussion of potential algorithms for the computation of the new distance measure.

Section snippets

Basic definitions

In this paper, we consider graphs with labeled nodes and edges. Let LV and LE denote the finite sets of node and edge labels, respectively. (Unlabeled graphs are obtained as a special case if |LV|=|LE|=1.)

Definition 1. A graph is a 4-tuple G=(V,E,μ,ν), where

  • V is a set of finite vertices,

  • EV×V is the set of edges,

  • μ:V→LV is a function assigning labels to the vertices,

  • ν:E→LE is a function assigning labels to the edges.

.

If V=∅ then G is called the empty graph.

Definition 2. Given a graph G=(V,E,μ,ν

Graph distance measure

Definition 7. The distance of two non-empty graphs G1 and G2 is defined asd(G1,G2)=1−|mcs(G1,G2)|max(|G1|,|G2|).

.

An example is shown in Fig. 1. Here we have |G1|=5, |G2|=4 and |mcs(G1,G2)|=3. Hence, d(G1,G2)=0.4.

Theorem 1. For any graphs G1, G2 and G3, the following properties hold true:

  • 1.

    0⩽d(G1,G2)⩽1,

  • 2.

    d(G1,G2)=0 ⇔ G1 and G2 are isomorphic to each other,

  • 3.

    d(G1,G2)=d(G2,G1),

  • 4.

    d(G1,G3)⩽d(G1,G2)+d(G2,G3).

Proof. Properties 1–3 follow directly from Definition 7. In the following proof of the triangle

Discussion and conclusion

We have shown that the graph distance measure of Definition 7 is in fact a metric. As discussed earlier it is often difficult to form a metric from edit distance measures. Therefore in applications where the properties of a metric are important, the largest common subgraph metric could be used.

One application where this is important is information retrieval from images and video databases (Chang et al., 1987; Lee and Hsu, 1992; Shearer et al., 1997). This area relies heavily on browsing to

References (20)

There are more references available in the full text version of this article.

Cited by (0)

View full text