1 Introduction

Description Logics [3] are a well-investigated family of logic-based knowledge representation languages, which are frequently used to formalize ontologies for application domains such as biology and medicine [17]. As the size of ontologies grows, the likelihood of them containing errors increases as well. This is particularly problematic if the data, stored in the ABox, are automatically extracted from text or other sources using natural language processing or machine learning. The reasoning services of DL systems [12, 15, 22, 33], which derive implicit consequences from the explicitly represented knowledge, are not only useful once an ontology is deployed, but can also be employed for debugging purposes by exhibiting consequences that are not supposed to hold in the application domain. Another reason why one might want to remove a consequence is that it reveals private information that is supposed to be hidden [5, 14]. Once such an unwanted consequence is detected, it is often not easy to see how to repair the ontology in order to get rid of this consequence. Classical repair approaches based on axiom pinpointing [8, 21, 27, 29, 31, 32] compute maximal subsets of the ontology that do not have the consequence. The obtained result thus strongly depends on the syntactic form of the axioms. For example, it is well-known that, for expressive DLs, a finite set of terminological axioms can be expressed by a single axiom. If the given terminology (TBox) is of this shape, then the only possible classical repair is the empty TBox. To alleviate this problem, repair approaches have been developed that replace certain axioms by weaker ones (in the sense that they have less consequences) instead of removing them completely [6, 18, 24, 34]. However, these approaches usually do not produce optimal repairs. In fact, it was shown in [6] that, even for the inexpressive DL \(\mathcal {EL}\), optimal repairs need not exist. The abstract example given there can be rephrased as follows. Assume that the TBox defines humans to be exactly those individuals that have a human parent, and that the ABox says that Sam is a human. After we find out that Sam is in fact not human [9], we want to get rid of the latter assertion, but keep the (correct) consequences saying that Sam has an unbounded chain of ancestors (of undetermined species). If the TBox is assumed to be fixed, then there is no optimal repair of the ABox since we can add only a finite number of parent assertions.

To avoid such problems, our previous work on computing optimal repairs (formulated in the guise of achieving compliance with privacy policies) restricted the attention to the case without TBox. In [5] the ABox was additionally restricted to be a so-called instance store [19], i.e., an ABox without role assertions. The privacy policy (specifying which consequences are to be removed) was given as \(\mathcal {EL}\) instance queries. In this setting, optimal repairs always exist and can be computed in exponential time, which is optimal since there may be exponentially many optimal repairs of exponential size.

In [7] these results were extended to ABoxes with role assertions. More precisely, we considered quantified ABoxes in which some individuals are anonymized by viewing them as existentially quantified variables. For example, assume that the ABox contains the information that Ben has a parent, Jerry, that is both rich and famous, and we want to remove the consequence . Classical repairs can be obtained by removing one of the assertions \( Rich ( JERRY )\), \( Famous ( JERRY )\), and \( parent ( BEN , JERRY )\). If instead we replace the first assertion with \( Rich (x)\) and \( parent ( BEN ,x)\) for an existentially quantified variable x, then we retain more consequences. Note that we could not have used an individual name (i.e., constant) \( ANNE \) instead of x since information like \( Rich ( ANNE )\) about Anne does not follow from the original ABox. We show in [7] that in this setting all optimal repairs can be computed by an exponential-time algorithm with access to an NP-oracle. The oracle is needed since our algorithm first computes a superset of the set of optimal repairs, from which non-optimal ones need to be removed using the (NP-complete) entailment test between (potentially exponentially large) quantified ABoxes. We also consider a modified version of entailment (called IQ-entailment) in [7], where quantified ABoxes are compared w.r.t. which \(\mathcal {EL}\) instance relationships they imply. Using this notion, no NP-oracle is needed for computing the set of all IQ-optimal repairs since IQ-entailment can be decided in polynomial time.

In the present paper, we improve on these results in two respects. On the one hand, we allow for the presence of terminological knowledge in the form of an \(\mathcal {EL}\) TBox, which is assumed to be correct, and thus is not changed by the repair. To deal with a TBox, the approach from [7] for computing optimal repairs must be extended in two ways. First, the ABox needs to be saturated w.r.t. the TBox before applying our repair approach. The saturated ABox has the same consequences as the original one has together with the TBox. In our Ben and Jerry example, assume that the assertion \( Rich ( JERRY )\) does not belong to the original ABox, but the TBox contains the axiom \( Famous \sqsubseteq Rich \). Then the ABox on its own does not have the unwanted consequence , but together with the TBox it does. Saturation adds the assertion \( Rich ( JERRY )\) to the ABox. For arbitrary TBoxes, saturation need not terminate. We consider two ways to remedy this problem: either allow for arbitrary TBoxes, but consider IQ-entailment, or use classical entailment, but consider cycle-restricted TBoxes [1]. In both cases, saturation always terminates; in the former in polynomial and in the latter in exponential time. One might be tempted to assume that, after saturation, one can simply apply the repair approach of [7] unchanged. This is not true, however, since the TBox may re-add assertions that have been removed or replaced by the repair. In our example, where \( Rich ( JERRY )\) is replaced, but \( Famous ( JERRY )\) is left untouched in the repair, the repaired ABox together with the TBox would still have the unwanted consequence. Thus, the repair approach needs to be changed to take this possibility into account.

On the other hand, the construction of optimal repairs described in our previous work [5, 7], and extended in this paper such that it can deal with TBoxes, is best case exponential. The second contribution of this paper is the design of a new construction, both for classical and IQ-entailment, that is exponential only in the worst case. We also report on first experimental results, which indicate that this reduces the size of the computed optimal repairs considerably.

Detailed proofs of our results can be found in [4].

2 Preliminaries

Throughout this paper, we assume that \(\varSigma \) is a signature, which is a disjoint union of sets \(\varSigma _{\mathsf {O}}\), \(\varSigma _{\mathsf {C}}\), and \(\varSigma _{\mathsf {R}}\) of object names, concept names, and role names. We use symbols tuvw to denote object names, AB to denote concept names, and rs to denote role names, all of them possibly with sub- or superscripts.

As in [7], a quantified ABox (qABox) over \(\varSigma \) consists of a finite subset X of \(\varSigma _{\mathsf {O}} \), the elements of which are called variables, and a matrix , which is a finite set of concept assertions A(u) where \(u\in \varSigma _{\mathsf {O}} \) and \(A\in \varSigma _{\mathsf {C}} \), and of role assertions r(uv) where \(u,v\in \varSigma _{\mathsf {O}} \) and \(r\in \varSigma _{\mathsf {R}} \). An non-variable object name in is called an individual name, and the set of all these names is denoted as . We further set . Traditional DL ABoxes are qABoxes where \(X=\emptyset \); we then write instead of . The matrix of a qABox is such a traditional ABox.

An interpretation of \(\varSigma \) is a pair , where the domain is a non-empty set and the interpretation function maps each \(u\in \varSigma _{\mathsf {O}} \) to an element of , each \(A\in \varSigma _{\mathsf {C}} \) to a set , and each \(r\in \varSigma _{\mathsf {R}} \) to a binary relation over . The interpretation of \(\varSigma \) is a model of a qABox over \(\varSigma \) if there is an interpretation such that , the interpretation functions and coincide on \(\varSigma \setminus X\), and for each as well as for each .

Following [7], we define \(\mathcal {EL}\) atoms and \(\mathcal {EL}\) concept descriptions over \(\varSigma \) by simultaneous induction as follows. An \(\mathcal {EL}\) atom is either a concept name \(A\in \varSigma _{\mathsf {C}} \) or an existential restriction for some role name \(r\in \varSigma _{\mathsf {R}} \) and an \(\mathcal {EL}\) concept description C. An \(\mathcal {EL}\) concept description is a conjunction where is a finite set of \(\mathcal {EL}\) atoms. An \(\mathcal {EL}\) concept inclusion is of the form \(C\sqsubseteq D\) for \(\mathcal {EL}\) concept descriptions C and D, and an \(\mathcal {EL}\) TBox is a finite set of such concept inclusions. An \(\mathcal {EL}\) concept assertion is an expression C(u), where C is an \(\mathcal {EL}\) concept description and \(u\in \varSigma _{\mathsf {O}} \).

For each interpretation of \(\varSigma \), we extend the interpretation function to \(\mathcal {EL}\) atoms and \(\mathcal {EL}\) concept descriptions in the following manner:

  • ,

  • where .

The interpretation is a model of the concept inclusion \(C\sqsubseteq D\) (the concept assertion C(u)) if (), and of the TBox if it is a model of each concept inclusion in .

To make the syntax introduced above more akin to the one usually employed for \(\mathcal {EL}\), we denote the empty conjunction \(\sqcap \emptyset \) as \(\top \) (top concept), singleton conjunctions \(\sqcap \{C\}\) as C, and conjunctions for as \(C_1\sqcap \ldots \sqcap C_n\), where \(C_1,\ldots ,C_n\) is an enumeration of the elements of in an arbitrary order. Since we do not distinguish between the singleton conjunction \(\sqcap \{C\}\) and the atom C, each atom is also a concept description. The set of subconcepts of an \(\mathcal {EL}\) concept description C is defined as follows: , , and . The set consists of all atoms contained in . These two notions are extended to TBoxes and sets of concept assertions in the obvious way.

Let \(\alpha , \beta \) be qABoxes, concept inclusions, or concept assertions (possibly not both of the same kind), and an \(\mathcal {EL}\) TBox. Then we write if the interpretation is a model of \(\alpha \). We say that \(\alpha \) entails \(\beta \) w.r.t. (written \(\alpha \models ^{\mathcal {T}}\beta )\) if every model of \(\alpha \) and is a model of \(\beta \). Furthermore, \(\alpha \) and \(\beta \) are equivalent w.r.t. (written ), if \(\alpha \models ^{\mathcal {T}}\beta \) and \(\beta \models ^{\mathcal {T}}\alpha \). In case , we will sometimes write \(\models \) instead of \(\models ^\emptyset \). If , then we also write and say that C is subsumed by D w.r.t. ; in case we simply say that C is subsumed by D. Two \(\mathcal {EL}\) concept descriptions are equivalent w.r.t. (written ) if they subsume each other w.r.t. . We write to indicate that , but . If , then a is called an instance of C w.r.t. and . For \(\mathcal {EL}\), the subsumption and the instance problem are decidable in polynomial time [2]. However, entailment between qABoxes is NP-complete even w.r.t. the empty TBox [7].

We also use the reduced form \(C^r\) of \(\mathcal {EL}\) concept descriptions C [23], which is obtained by removing redundant subdescriptions (see [7] for details).

Adapting the results in [23], one can show that \(C \equiv ^\emptyset C^r\) and that \(C \equiv ^\emptyset D\) implies \(C^r=D^r\).

3 A Tale of Two Entailments

DL-based ontologies are usually accessed through appropriate query languages, where for the purpose of this paper it is sufficient to assume that a query language is given by a fragment of first-order logic. Instead of comparing ontologies w.r.t. the models they have, it thus makes sense to compare them w.r.t. the answers to queries they entail [25]. Given such a query language QL and an \(\mathcal {EL}\) TBox , we say that the qABox QL -entails the qABox w.r.t. (written ) if for each query \(\varphi (x_1,\ldots ,x_k)\in {\textsf {QL}}\) and each tuple of individuals \((a_1,\ldots ,a_k)\) we have that implies , where we view the TBox and the ABox as first-order formulae and \(\models \) is classical first-order entailment (see [25] for more details). We say that two qABox are QL -equivalent w.r.t. if they QL-entail each other w.r.t.  , and denote this equivalence relation as .

For \(\mathcal {EL}\) ontologies, one usually considers instance queries (IQ) or conjunctive queries (CQ). The former are given by \(\mathcal {EL}\) concept descriptions, viewed as first-order formulae with one free variable. The latter are basically qABoxes of the form , but with the elements of viewed as free variables. Replacing these free variables with a tuple of individuals thus yields a qABox in the sense introduced above. In particular, this means that CQ-entailment corresponds to entailment of the same qABoxes (see [7] for more details regarding the connection between conjunctive queries and qABoxes).

3.1 Classical Entailment and CQ-Entailment

Due to the close connection between conjunctive queries and qABoxes mentioned above, it is easy to see that the classical entailment relation \(\models ^{\mathcal {T}}\) between qABoxes, as introduced in the previous section, actually coincides with CQ-entailment \(\models ^{\mathcal {T}}_{\textsf {CQ}}\). To keep the notation more uniform and to distinguish this kind of entailment explicitly from IQ-entailment, we will usually talk about CQ-entailment and write \(\models ^{\mathcal {T}}_{\textsf {CQ}}\).

Whenever we compare two qABoxes and , we assume without loss of generality that they are renamed apart, which means that X is disjoint with and Y is disjoint with , and we further assume that the two qABoxes speak about the same set of individual names . For the case of an empty TBox, it was shown in [7] that iff there is a homomorphism from to . A homomorphism from to is a mapping such that \(h(a)=a\) for each \(a\in \varSigma _{\mathsf {I}} \), for each , and for each . In order to obtain a similar characterization of entailment for the case of a non-empty TBox , we need to saturate the given qABox w.r.t. .

Basically, this saturation performs what is called the chase in the database community [10, 20, 26]. Given an \(\mathcal {EL}\) TBox and a qABox , it extends the ABox by new assertions that are implied by the TBox. The rules that realize this are described in Fig. 1. Their rôle is two-fold: whereas the \(\sqsubseteq \)-rule adds new concept assertions that are implied by the ABox together with the TBox, the other two rules break down the complex concept assertions added by this rule into smaller parts.

Fig. 1.
figure 1

The CQ-saturation rules.

In general, applying these rules need not terminate; e.g., if applied to the qABox for the TBox . There are various sufficient conditions that guarantee termination of the chase [13]. Here, we use a condition introduced in [1] in the context of unification in \(\mathcal {EL}\).

Definition 1

The \(\mathcal {EL}\) TBox is cycle-restricted if there is no non-empty sequence of role names \(r_1,\ldots ,r_k\) and \(\mathcal {EL}\) concept description C such that .

As shown in [1], it can be decided in time polynomial whether a given \(\mathcal {EL}\) TBox is cycle-restricted or not. For cycle-restricted TBoxes, CQ-saturation always terminates.

Theorem 2

Let be a cycle-restricted \(\mathcal {EL}\) TBox and a qABox. Then exhaustive application of the CQ-saturation rules terminates in exponential time in the size of and , and yields a qABox such that the following statements are equivalent for all qABoxes :

  • ,

  • ,

  • there is a homomorphism from to .

We can show that there are examples where the CQ-saturation of a qABox w.r.t. a cycle-restricted TBox is of exponential size, and thus its computation must take exponential time. Nevertheless, the entailment relation \(\models ^{\mathcal {T}}_{\textsf {CQ}}\) can still be decided within NP by adapting results for conjunctive query answering in \(\mathcal {EL}\)  [30].

3.2 IQ-Entailment

Recall that the qABox IQ-entails the qABox w.r.t. the \(\mathcal {EL}\) TBox if every concept assertion C(a) entailed w.r.t. by the latter is also entailed w.r.t. by the former. In the following we assume again that these two qABoxes are renamed apart. For the case of an empty TBox, it was shown in [7] that iff there is a simulation from to . A simulation from to is a relation such that for each \(a\in \varSigma _{\mathsf {I}} \) and, for each , implies and implies that there exists an object \(v'\in \varSigma _{\mathsf {I}} \cup X\) such that and . Since checking the existence of a simulation can be done in polynomial time [16], we conclude that IQ-entailment between qABoxes can be decided in polynomial time for the case of an empty TBox.

To extend these results to the case of a non-empty TBox, we again need to saturate the ABox w.r.t. the TBox. But now the saturation rules, given in Fig. 2, are more parsimonious w.r.t. the introduction of new objects. To be more precise, for each existential restriction , we assume that \(x_C\) is a fresh variable not contained in the initial qABox . When applying the -rule to an assertion of the form , we always use this variable for the successor object. Due to this restriction, IQ-saturation always terminates, i.e., it is not necessary to impose any restrictions on the TBox. Also note that IQ-saturation basically generates a qABox representation of what is called the canonical model in [25, Section 5.2].

Fig. 2.
figure 2

The IQ-saturation rules.

Theorem 3

Let be an \(\mathcal {EL}\) TBox and a qABox. Then exhaustive application of the IQ-saturation rules terminates in polynomial time in the size of and , and yields a qABox such that the following statements are equivalent for all qABoxes :

  • ,

  • ,

  • there is a simulation from to .

Since can be computed in polynomial time and the existence of a simulation can be decided in polynomial time, this shows that the entailment relation \(\models ^{\mathcal {T}}_{\textsf {IQ}}\) can be decided in polynomial time.

4 Canonical Repairs

We specify what is to be repaired by a finite set of \(\mathcal {EL}\) concept assertions, which we call a repair request. A repair is a qABox that does not have any of these assertions as a consequence. This generalizes previous repair approaches [6] in that more than one consequence specified as unwanted is removed in one step. It also encompasses the notion of a privacy policy, as introduced in [7], which specifies forbidden concepts, with the meaning that one should not be able to derive that any of the individuals occurring in the qABox is an instance of such a concept. We assume that the TBox is static (i.e., may not be changed by the repair) and consider both CQ- and IQ-entailment for comparing qABoxes.

Definition 4

Let be an \(\mathcal {EL}\) TBox and \({\textsf {QL}}\in \{{\textsf {CQ}},{\textsf {IQ}}\}\).

  • An \(\mathcal {EL}\) repair request is a finite set of \(\mathcal {EL}\) concept assertions.

  • Given a qABox and an \(\mathcal {EL}\) repair request , a QL-repair of for w.r.t. is a qABox such that and for all .

  • Such a repair is optimal if there is no QL-repair of for w.r.t. such that and .

Intuitively, a repair is a qABox that has no new consequences of the specified type (instance relationships or answers to conjunctive queries), and no longer has the consequences forbidden by the repair request. In an optimal repair, a minimal amount of consequences of the specified type is lost. Since there are different options for what to change when repairing a qABox, there may exist several non-equivalent optimal repairs.

In the following, let \({\textsf {QL}}\in \{{\textsf {CQ}},{\textsf {IQ}}\}\) and let be a fixed TBox, which is assumed to be cycle-restricted if \({\textsf {QL}}= {\textsf {CQ}}\). In addition, let be a repair request and be the qABox to be QL-repaired for w.r.t. . We assume that does not contain an assertion of the form C(a) such that since the presence of such an assertions would preclude the existence of a repair. If satisfies this restriction, then the empty qABox is always a repair. However, as mentioned in the introduction, this does not imply that there is an optimal repair. We will show that, for the case of IQ-entailment, optimal repairs always exist. For CQ-entailment, this is the case if the TBox is cycle-restricted. In both cases, the set of optimal repairs covers all repairs in the sense that each repair is entailed by some optimal repair.

As mentioned in the introduction, to deal with TBoxes, the approach for computing so-called canonical repairs from [7] needs to be adapted in two ways. First, one needs to QL-saturate the given qABox w.r.t. the TBox. Second, when computing canonical repairs from , the construction needs to ensure that the TBox does not reintroduce consequences that have been removed by the repair. The main idea underlying the construction of canonical repairs is to introduce variables as copies of the objects occurring in . Such a variable is of the form , where the first component of the subscript says that this is a copy of the object u. The second component is a set of atoms, with the intuitive meaning that must not be an instance of any element of  . To avoid introducing unnecessary copies, certain restrictions were imposed in [7] on the sets . We add a further restriction that takes care of the TBox.

To be more precise, let be the set of subconcepts of concept descriptions occurring in or , and let be the set of atoms occurring in . The set in a variable must be a repair type for u.

Definition 5

Let and let u be an object name occurring in . A repair type for u is a subset of that satisfies the following:

  1. 1.

    for each atom ,

  2. 2.

    if CD are distinct atoms in , then \(C\not \sqsubseteq ^{\emptyset }D\),

  3. 3.

    is premise-saturated w.r.t. , i.e., for all with and for some , there is such that \(C\sqsubseteq ^{\emptyset }E\).

The first two conditions coincide with the ones in [7]. Basically, 1. says that we only need to remove instance relationships explicitly if they are really there. Condition 2. corresponds to the fact that preventing as a consequence also prevents if D subsumes C, and thus would be redundant if . Condition 3. ensures that instance relationships that are removed due to cannot be re-introduced by the TBox. It is easy to see that the set of repair types for u can be computed in exponential time.

Similarly to the approach in [7], canonical repairs are induced by seed functions. Such a function determines, for each individual, which instance relationships should be prevented in order to obtain a repair.

Definition 6

A repair seed function is a function s that maps each individual name to a repair type s(b) for b that satisfies the following:

  • if and , then s(b) contains an atom D such that \(C\sqsubseteq ^{\emptyset }D\).

Using our general assumption that the repair request does not contain a concept assertion C(a) with , we can show that there is always at least one repair seed function.

Each repair seed function induces a repair as follows.

Definition 7

Given a repair seed function s, we define the canonical QL -repair induced by s as the qABox where

  1. 1.

    the set Y consists of the variables for all object names u occurring in and all repair types for u, except for the case where u is an individual name and , and

  2. 2.

    the matrix consists of the following assertions, where we use \(y_{b,s(b)}\) as a synonym for the individual name b:

    • for each concept assertion A(u) in such that ,

    • for each role assertion r(uv) in such that the following holds for each : if the matrix of entails C(v), then the set contains an atom that subsumes C.

Our construction of canonical repairs based on seed functions is sound and complete in the following sense.

Proposition 8

For each repair seed function s, the induced canonical repair is a QL-repair of for w.r.t. . Conversely, if is a QL-repair of for w.r.t. , then there is a repair seed function s such that .

We define the set of all canonical QL-repairs of for w.r.t. as

figure bj

As an easy consequence of Proposition 8 we obtain that contains all optimal repairs (up to equivalence). However, as in the case without a TBox, it may also contain non-optimal repairs [7]. To compute the set of optimal repairs, one thus needs to remove such non-optimal elements from . Since the entailment test required for this is NP-complete for \({\textsf {QL}}= {\textsf {CQ}}\) and polynomial for \({\textsf {QL}}= {\textsf {IQ}}\), we obtain the following theorem.

Theorem 9

There is a (deterministic) algorithm that computes the set of all optimal QL-repairs of for w.r.t. and runs in exponential time. If \({\textsf {QL}}= {\textsf {CQ}}\), then this algorithm needs access to an NP oracle, whereas no such oracle is required for \({\textsf {QL}}= {\textsf {IQ}}\).

5 Optimized Repairs

The construction of the canonical repair induced by a seed function described in the previous section usually introduces an exponential number of copies for the objects occurring in the saturated qABox. The following example demonstrates that this is not always necessary to obtain an optimal repair.

Example 10

Let and consider the repair request for the qABox . There is only one repair seed function s, which assigns to a. Both for the CQ and the IQ case, the canonical repair induced by s contains \(2^n\) copies of x, namely all the variables for . However, most of these copies are redundant. In fact, we will see below that there are optimal repairs equivalent to the canonical one that contain only linearly many variables in n, both for the CQ and the IQ case.

The idea is now to construct, for a given seed function, a set of variables that is a (hopefully small) subset of the set Y introduced in Definition 7, which is nevertheless sufficient to obtain a repair equivalent to the canonical one. Note, however, that in general an exponential blow-up cannot be avoided, as already shown in [5] for the case of \(\mathcal {EL}\) instance stores. Throughout this section, we assume that \({\textsf {QL}}\), , , and satisfy the properties assumed in the previous section. In addition, we assume that the repair request is reduced, i.e., every concept occurring in a concept assertion in is reduced, and if contains C(a) and D(a) for distinct concept descriptions CD, then \(C\not \sqsubseteq ^{\emptyset }D\), and we further assume that each concept occurring in the TBox  is reduced. Before we can describe our construction of the set of relevant variables, we must introduce some notation and show an auxiliary result.

Given two sets of concept descriptions and , we say that covers (written ) if each concept in is subsumed by some concept in .

Now, let s be a repair seed function and set . Recall that, according to Definition 7, a role assertion belongs to the matrix iff the saturation contains the role assertion r(tu) and the repair type covers the set

If does not satisfy this requirement, there might be another repair type such that the canonical repair contains the assertion , and thus our optimized repair needs to contain an appropriate variable to which can be mapped by a homomorphism or simulation. We generate such variables by looking for repair types that cover both and . The set of all such repair types can effectively be computed, though it might be empty. For our purposes, it is sufficient to use only the ones that are minimal w.r.t. the cover relation \(\le \).

Lemma 11

The set of all \(\le \)-minimal repair types for u that cover can be computed in exponential time.

In general, this computation may produce exponentially many repair types, but this is not always the case. For instance, consider \(a = y_{a,s(a)}\) and \(y_{x,\emptyset }\) in Example 10. We have and thus the assertion \(r(a,y_{x,\emptyset })\) is not in since \(\emptyset \) clearly does not cover . The \(\le \)-minimal repair types covering are exactly the sets \(\{A_i\}\) for \(i=1,\ldots ,n\).

In the following, we construct a sequence \(Y_0,Y_1,\dots ,Y_m\) of subsets \(Y_i\) of Y such that is QL-equivalent to its sub-qABox where contains only those assertions in involving object names in \(\varSigma _{\mathsf {I}} \cup Y_m\). Recall that we use \(y_{a,s(a)}\) as synonyms for the individuals \(a\in \varSigma _{\mathsf {I}} \).

We start with the set \(Y_0\), which is empty if \({\textsf {QL}}={\textsf {IQ}}\), and equal to the set if \({\textsf {QL}}={\textsf {CQ}}\).

The subsequent sets are obtained by exhaustively applying one of the following rules, depending on whether \({\textsf {QL}}= {\textsf {CQ}}\) or \({\textsf {QL}}= {\textsf {IQ}}\).

  • CQ-construction rule. If and are elements of \(\varSigma _{\mathsf {I}} \cup Y_i\), the saturation contains the role assertion r(tu), the repair type does not cover , and is a \(\le \)-minimal repair type for u that covers , but is not contained in \(\varSigma _{\mathsf {I}} \cup Y_i\), then set .

  • IQ-construction rule. If is an element of \(\varSigma _{\mathsf {I}} \cup Y_i\), the saturation contains the role assertion r(tu), and is a \(\le \)-minimal repair type for u that covers , but is not contained in \(\varSigma _{\mathsf {I}} \cup Y_i\), then set .

The sets \(Y_i\) are all subsets of the set Y of variables in the canonical repair. Since each rule application adds a variable, the exhaustive application of rules must terminate after finitely many steps with a set of variables \(Y_m\subseteq Y\).

Let us illustrate this construction using Example 10, first for the IQ case. We have \(a = y_{a,s(a)}\in \varSigma _{\mathsf {I}} \) and the assertion r(ax) belongs to the saturation, which is equal to the original qABox. As mentioned above, the \(\le \)-minimal repair types covering are exactly the sets \(\{A_i\}\) for \(i=1,\ldots ,n\). Thus, repeated applications of the IQ-construction rule add the variables \(y_{x,\{A_i\}}\), and the construction ends with . In the CQ case, the initial set of variables is \(Y_0^{\textsf {CQ}}= \{y_{a,\emptyset },y_{x,\emptyset }\}\). In this example, the CQ-construction rule then generates the same variables as the IQ rule, though this need not be the case in general. We end up with the final set \(Y_m^{\textsf {IQ}}\cup Y_0^{\textsf {CQ}}\).

Definition 12

Let s be a repair seed function and \(Y_m\subseteq Y\) be the set of variables obtained by an exhaustive application of the QL-construction rule. The optimized \({\textsf {QL}}\)-repair of for w.r.t. induced by s, denoted by , is the qABox where the matrix contains all assertions in involving only object names in \(\varSigma _{\mathsf {I}} \cup Y_m\).

Note that, to compute , we need not compute the larger matrix first. Instead, we just apply the definition of the matrix in Definition 7 to the object names in \(\varSigma _{\mathsf {I}} \cup Y_m\).

In our example, the optimized \({\textsf {IQ}}\)-repair is the qABox with

figure ci

In the optimized CQ-repair, the quantifier prefix additionally contains the variables \(y_{a,\emptyset }\) and \(y_{x,\emptyset }\), and the matrix additionally contains the assertions \(r(y_{a,\emptyset },y_{x,\emptyset })\) and \(A_i(y_{x,\emptyset })\) for \(i=1,\ldots ,n\). Note that, without these assertions, the positive answer to the Boolean conjunctive query would be lost.

Coming back to the general case, we first observe that the canonical QL-repair induced by s QL-entails the optimized QL-repair induced by s due to the inclusion relationship between these two qABoxes. The entailment in the other direction also holds, but this is harder to show, in particular for \({\textsf {QL}}= {\textsf {CQ}}\).

Proposition 13

For each repair seed function s, the optimized \({\textsf {QL}}\)-repair induced by s \({\textsf {QL}}\)-entails the canonical \({\textsf {QL}}\)-repair induced by s.

Proof sketch. For \({\textsf {QL}}= {\textsf {IQ}}\), the proposition can be proved by showing that the following relation is a simulation from to :

figure ck

For \({\textsf {QL}}= {\textsf {CQ}}\), we introduce a sequence of mappings , starting with if \(t\in \varSigma _{\mathsf {I}} \) and and otherwise. The initial mapping \(h_0\) need not be a homomorphism since role assertions may not be preserved. In the step-wise construction of the mappings \(h_i\) such defects are corrected, one by one. We can show that this construction always terminates after finitely many steps, yielding a homomorphism \(h_n\) from to .    \(\square \)

Summing up, we have thus shown the following theorem, which implies that the optimized repairs also satisfy the properties stated in Proposition 8.

Theorem 14

For each repair seed function s, the canonical \({\textsf {QL}}\)-repair induced by s and the optimized \({\textsf {QL}}\)-repair induced by s are \({\textsf {QL}}\)-equivalent.

6 Evaluation

To find out whether the repair approaches introduced in this paper are in principle viable for non-trivial ontologies, we made experiments for both IQ and CQ-repairs with a first, rather unoptimized implementation. In addition to checking how often the implementation was able to compute a repair within a certain timeout, we also compared the sizes of optimized repairs with those of canonical repairs. We considered two different repair scenarios: repairing a single unwanted consequence for a single individual (S1), and repairing a single unwanted consequence for 10% of the individuals occurring in the ABox (S2). We report here the main results—more details and discussions can be found in [4].

As corpus for our evaluation, we chose the ontologies used in the 2015 OWL Reasoner Competition for the track OWL EL Realisation [28], since they contain a substantial amount of ABox assertions. These 109 ontologies were converted into pure \(\mathcal {EL}\) by applying standard transformations and afterwards filtering out unsupported axioms. From these ontologies, we kept those that had at most 100,000 axioms in total. The resulting corpus contained 80 ontologies.

We implemented our methods in Java, using the OWL-APIFootnote 1 for parsing OWL ontologies, and ELK [22] for precomputing any subsumption relationships entailed with and without the TBox potentially relevant for our repair approach. The code is available online.Footnote 2 All experiments were performed on an Intel(R) Core(TM) i5-4590 CPU with 4 cores and 32 GB RAM, of which we assigned 16 GB as maximal heap space to the Java VM.

Since it is a precondition of our repair approach, we first saturated the ontologies using the IQ-saturation rules of Figure 2, and the CQ-saturation rules of Figure 1. The CQ-saturation rules were implemented using the rule engine VLog [11] through the Java facade Rulewerk.Footnote 3 As CQ-saturation only terminates for cycle-restricted TBoxes, we only considered those ontologies for the CQ-saturation whose IQ-saturation did not introduce cycles between introduced variables. We used a timeout of 60 minutes for every saturation. This way, we successfully computed IQ-saturations of every ontology, and 62 CQ-saturations.

The size of the saturated ABox was usually not much larger than that of the original one, and always less than two orders of magnitude larger. Interestingly, the successful CQ-saturations were rarely larger than the IQ-saturations, and often even of the same size, because no variables were added.

Scenario S1 was about repairing a single faulty entailment . Since we did not have information about whether any entailments from the considered ontologies are faulty, we generated such assertions randomly. For this, we looked at entailments of the form , where . To make the repair requests more interesting, we furthermore required that C is not of the form A or , where A is a concept name. This requirement already ruled out 54 of the IQ-saturated ontologies, and 44 of the CQ-saturated ontologies, as they did not have any complex entailments of the required form. For Scenario S2, we randomly selected some concept which had at least one instance (surprisingly, although C was not required to be complex, this ruled out 12 ontologies, including 4 of the CQ-saturated ones), together with a random selection of 10% of the individuals in , and built the repair request consisting of all assertions C(a) where a ranges over the selected individuals. For both scenarios, we selected a random seed function for the obtained repair request.

For each ontology, scenario, and \({\textsf {QL}}\in \{{\textsf {IQ}},{\textsf {CQ}}\}\), we attempted to compute optimised QL-repairs for 50 different repair requests. We also tried to compute the set of objects that would be included in the canonical repairs, to get an idea of the impact of our optimisation. For each such repair computation, we used a timeout of 10 minutes. Since all repair requests used only concept descriptions that were already in the input ontology, the number of objects in the canonical repair was independent of the repair request. We thus performed the latter computation only once for each ontology. The success rates were as follows:

  • The objects included in the canonical IQ- and CQ-repair could be computed within the timeout and without memory exceptions for respectively only 52.9  % and 62.1  % of the ontologies.

  • For S1, we could compute the optimized IQ-repair in 99.9 %, and the optimised CQ-repair in 100.0 % of all attempts.

  • For S2, 98.9  % of IQ-repairs and 99.9  % of CQ-repairs were successful.

This shows that the optimizations introduced in Section 5 have a very positive impact on the viability of our repair approach.

Fig. 3.
figure 3

Evaluation results. On the left, we show the difference of the number of object names in the canonical IQ-repairs (purple triangle) with the same difference, but restricted to objects occurring in assertions, for the optimised IQ-repairs (red circle) for S2. The other two graphs consider optimised IQ- and CQ-repairs for S1 and S2. In each graph, the x-axis shows the number of assertions in the input ontology, and the y-axis the observed difference.

Fig. 3 gives more information on the number of objects and assertions in the computed repairs. On the left, we consider canonical and optimised IQ-repairs for scenario S2: specifically, we look at the difference in numbers of individuals occurring in the repair compared to the input ABox.

In the middle and on the right, we visualise the difference between the number of assertions in the optimized IQ- and CQ-repairs, compared to the input ABoxes, for the scenarios S1 and S2, respectively. By construction, CQ-repairs cannot contain less assertions than the input ontologies. Sometimes the CQ-repairs were smaller than the corresponding IQ-repairs, which is due to the different saturation methods: variables introduced by the IQ-saturation could be connected to more individuals than for the CQ-saturation.

7 Conclusion

This paper presents approaches for repairing DL-based ontologies, in the sense that they allow to get rid of unwanted consequences. In contrast to most of the other work on ontology repair, our goal is to compute optimal repairs, i.e., ones that lose the least amount of other consequences. As relevant consequences to be preserved, we consider both answers to conjunctive queries (CQ) and answers to \(\mathcal {EL}\) instance queries (IQ). The presented results improve on our previous work in this direction in two respects. First, we allow for the presence of a TBox, which is assumed to be static (i.e., cannot be changed by the repair), whereas before we assumed that the TBox is empty. Second, we develop a more efficient construction of optimal repairs, which is exponential only in the worst case. Our experimental results show that this optimization makes our repair approach viable also for fairly large ontologies, at least for the IQ case.

One question for future research is how to lift the restriction to cycle-restricted TBoxes in the CQ case. Since optimal repairs need not longer exist then, one can ask whether the existence question is decidable, and how to compute optimal repairs if they exist. We have already noticed in our first attempts to tackle this problem that optimal repairs may then become larger than single-exponential.

In this and in our previous work, we have assumed that unwanted consequences are specified as \(\mathcal {EL}\) instance relationships. Another interesting open question is whether our results can be generalized to a setting where unwanted consequences are specified as answers to conjunctive queries, as e.g. in [14].Footnote 4