1 Introduction

FusemateFootnote 1 is a logic programming system for computing possible models of disjunctive logic programs [23, 24]. A Fusemate logic program consists of (typically) non-ground if-then rules with stratified default negation in the body [21]. Stratification entails that a true default-negated body literal remains true in the course of deriving new conclusions.

Fusemate was introduced in [7] for modelling systems that evolve over time and for analysing their current state based on the events so far. Such tasks are often subsumed under the terms of stream processing, complex event recognition, and situational awareness, and have been addressed (also) with logic-based approaches [2, 4, 5, 9].

To my knowledge, Fusemate is unique among all these and other logic programming systems [1, 12, 13, 16, 26] (and theorem provers) in the way it is implemented. Fusemate is implemented by shallow embedding in a full-fledged programming language, Scala [25]. Essentially, the user writes a syntactically sugared Scala program utilizing familiar logic programming notation, and the program’s execution returns models. This has advantages and disadvantages. The main disadvantages is that it is more difficult to implement performance boosting measures like term indexing. The main advantage is that interfacing with data structure libraries and with external systems is easy, an aspect whose importance has been emphasized for virtually all of the above systems. In fact, Fusemate is motivated in parts by exploring how far the embedding approach can be pushed and to what benefit.

The earlier Fusemate paper [7] focused on the model computation calculus with a belief revision operator as the main novelty. It utilized a certain notion of stratification by time (SBT) for making the calculus effective and useful in the intended application areas. This system description focuses on the advantages of the shallow embedding approach as boosted by new language features introduced here. These new language features are (a) non-standard comprehension and aggregation operators, among others, and (b) a weaker notion of stratification by time and predicates (SBTP). In brief, SBTP is a lexicographic combination of stratification by time and the standard stratification in terms of the call-graph of the program. Section 5 has an example that demonstrates the need for (a) and (b) in combination, and Section 4 discusses the shallow embedding approach and its advantages on a more general level.

Here is an excerpt from a Fusemate program that previews some of the new features:

figure a

The scenario comprises traffic lights identified by numbers 1 to 10 (line 2). In the course of time the traffic lights change their colors, and each such event is recorded as a corresponding Change atom (line 3). The rule on line 6 computes a State at a current time Now(time) as a snapshot of the current colors of all traffic lights. For that, the comprehension on line 6 finds the latest Change event before or at time for a fixed id chosen from allIds, and binds that time to the (unused) variable t. A FullState aggregates the separate State facts at a time partitioned as (Scala) sets of ids of “drive” and “stop” colors. In that, the special form collects in a Scala List-typed variable the specified terms that satisfy the body behind . Notice that all atoms in FullState refer to the same time, yet the program is SBTP because State comes before FullState in predicate stratification. (Predicate stratification is computed automatically by Fusemate with Tarjan’s algorithm.) The rule on line 10 demonstrates the use of the Scala Set method size in the body. Line 11 demonstrates the use of default negation in combination with comprehension. When applied to a given sequence of Change events, Fusemate computes models, one-at-a-time, each as Scala set of atoms.

2 Fusemate Programs

For the purpose of this paper, a brief summary of the syntactic notions underlying Fusemate programs is sufficient; see [7] for details. Terms and atoms of a given signature are defined as usual. Let \( var (z)\) denote the set of variables occurring in an expression z. We say that z is ground if \( var (z) = \emptyset \). We write \(z\sigma \) for applying a substitution \(\sigma \) to z. The domain of \(\sigma \) is denoted by \( dom (\sigma )\). A substitution \(\gamma \) is a grounding substitution for z iff \( dom (\gamma ) = var (z)\) and \(z\gamma \) is ground. In this case we simply say that \(\gamma \) is for z.

Let \(\mathbb {T}\) be a countably infinite discrete set of time points equipped with a total strict ordering < (“earlier than”), e.g., the integers. Assume that the time points, comparison operators \(=\) and \(\le \), and a successor time function \(+ 1\) are part of the signature and interpreted in the intended way. A time term is a (possibly non-ground) term over the sub-signature \(\mathbb {T}\cup \{+1\}\).

The signature may contain other “built-in” predicate and function symbols for predefined types such as strings, arithmetic data types, sets, etc. We only informally assume that all terms are built in a well-sorted way and that built-in operators over ground terms can be evaluated effectively.

An ordinary atom (with time term t) is of the form \(p(t, t_1,\ldots ,t_n)\) where p is an ordinary predicate (i.e., neither a time predicate nor built-in), t is a time term and \(t_1,\ldots ,t_n\) terms. A (Fusemate) rule is an implication written in Prolog-like syntax as

(1)

In (1), a rule head H is either (a) a disjunction \(h_1 \vee \cdots \vee h_m\) of ordinary atoms, for some \(m \ge 1\), or (b) the expression .Footnote 2 In case (a) the rule is ordinary and in case (b) it is a fail rule. A rule body B, the part to the right of , is defined by mutual recursion as follows. A positive body literal is one of the following: (a) an ordinary atom, (b) a comprehension atom (with time term x) of the form , where x is a variable, \(\circ \in \{<, \le , >, \ge \}\) and B is a body, (c) a built-in call , i.e., an atom with a built-in predicate symbol, or (d) a special form , , or where x is a variable, st are terms, \( ts \) is a list of terms, and B is a body. A positive body is a list \(\vec {b} = b_1,\ldots ,b_k\) of positive body literals with \(k \ge 0\). If \(k = 0\) then \(\vec {b}\) is empty otherwise it is non-empty. A negative body literal is an expression of the form , where \(\vec {b}\) is a non-empty positive body. A body is a list comprised of a (possibly empty) positive body and (possibly zero) negative body literals. It is variable free if \( var (b_1,\ldots ,b_k) = \emptyset \).

Let r be a rule (1). We say that r is range-restricted iff \( var (H) \subseteq var (\vec {b})\). Compared to the usual notion of range-restrictedness [18], Fusemate rules may contain extra variables in negative body literals. For example, is range-restricted in our sense with extra variables s and y. The extra variables are implicitly existentially quantified within the expression. The example corresponds to the formula . Semantically and operationally this will cause no problems thanks to stratification, introduced next.

Fusemate programs – sets of rules – need to be “stratified by time and by predicates” (SBTP). The standard notion of stratification by predicates means that the call graph of the program contains no cycles going through negative body literals. The edges of this call graph are the “depends on” relation between predicate symbols such that p positively (negatively) depends on q if there is a rule with a p-atom in its head and a q-atom in its positive (negative) body. For disjunctive heads, all head predicates are defined to depend positively on each other. Every strongly connected component of the call graph is called a stratum, and in predicate stratified programs negative body literals can occur only in strata lower than the head stratum.

SBTP is defined as follows: for every rule (1) in a given program, (a) there is a variable \( time \) that is the time term of some ordinary \(b \in \vec {b}\), (b) if H is an ordinary head then every head literal must have a time term constrained to be \(\ge \) than \( time \), and (c) for all rule bodies B occurring in the rule:

  1. (i)

    the time term of every ordinary or comprehension body literal in B must be constrained to be \(\le \) than \( time \), and

  2. (ii)

    for every negative body literal in B (including the top-level body of (1)) and every ordinary or comprehension literal \(b \in \vec {b}\) , the time term of b must constrained to be (i) < than \( time \) or (ii) \(\le \) than \( time \) and the predicate symbol of b is in a lower stratum than H.

For the purpose of this paper we only informally assume that all rules contain constraints for enforcing the required time ordering properties. There are similar stratification requirements for comprehension atoms and special forms so that their evaluation satisfies the counterpart of condition (ii) (see below for ). A fully formal definition could be given by modifying the spelled-out definition of SBT in [7].

As an example, if belongs to a lower stratum than then the following five rules all are SBTP, while only the first two rules are SBT.

figure e

Finally, a (Fusemate) program is a set of range-restricted rules that is SBTP.

3 Model Computation

The possible model semantics of disjunctive logic programs [23, 24] associates to a given disjunctive program a certain set of normal programs (i.e., without disjunctive heads) and takes the intended model(s) of these normal programs as the possible models of the given program. These “split” programs represent all possible ways of making one or more head literals true, for every disjunctive rule. As a propositional example, the program is associated to the split programs and . The possible models, hence, are \(\{a, b \}\) and \(\{a, b, c \}\)

Fusemate computes possible models by bottom-up fixpoint computation and dynamic grounding the program rules in the style of hyper tableaux [8]. The model computation procedure is implemented as a variant of the well-known given-clause algorithm, which seeks to avoid deriving the same conclusion from the same premises twice. It exhausts inferences in an outer loop/inner loop fashion according to the given program’s stratification by time and by predicates. The main data structure is a set of paths, where each path represents a partial model candidate computed so far (see [7] for more details). Paths are selected, extended, split and put back into the set until exhausted, for a depth-first, left-to right inference strategy. Paths carry full status information, which is instrumental for implementing incrementality, such that facts with current or later time can be added at any stage without requiring model recomputation from scratch. This, however, necessitated keeping already exhausted paths for continued inferences later.

The proof procedure’s core operation is computing a body matcher, i.e., a substitution \(\gamma \) for a rule’s positive body variables so that the rule body becomes satisfied in the current partial model candidate. Formally, let I be a set of ordinary ground atoms, representing the obvious interpretation that assigns true to exactly the members of I. Let B be a body. A body matcher for B is a substitution \(\gamma \) for the positive body of B, written as \(I, \gamma \,\models \,B\), such that the following holds (bB means the sequence of head b and rest body B):

figure f

A comprehension atom stands for the subset of all ground p-instances in I such that B is satisfied and with a time x as close as possible to t wrt. < or \(\le \). The cases for > and \(\ge \) are dual and not spelled out above to save space. The special form collects in the variable x the set of all instances of term t such that the body C is satisfied in I. We require comprehension atoms and s to be used in a stratified way, so that their results do not change later in a derivation when I is extended. The requirements are the same as with and can be enforced by ordering constraints.

The definition above extends the earlier definition of body matchers in [7] with the new comprehension construct and the , , , operators. It now also enforces left-to-right evaluation of B because the new binding operators depend on a fixed order guarantee to be useful. An example is the (nonsensical) body which relies on this order. Undefined cases, e.g., when evaluation of a non-ground built-in is attempted, or when a binder variable has already been used before are detected as compile time syntax errors.

4 Shallow Embedding in Scala

Fusemate is implemented as a shallow embedding into Scala [25]. It has three conceptual main components: a signature framework, a Scala compiler plugin, and an inference engine for fixpoint computation as explained in Section 3. The signature framework provides a set of Scala class definitions as the syntactical basis for writing Fusemate programs. It is parameterized in a type Time, which can be any Scala or Java type that is equipped with an ordering and an addition function for time increments, for example or . The programmer then refines an abstract class Atom of the Time-instantiated signature framework with definitions of predicate symbols and their (Scala-)sorted arities. See lines (3)-(5) in the program in the introduction for an example. These atoms then can be used in Fusemate rules, see lines (6)–(12) in the example.

While written in convenient syntax, rules are syntactically ill-formed Scala. This problem is solved by the compiler plugin, which intercepts the compilation of the input file at an early stage and transforms the rules into valid Scala source code.Footnote 3 More precisely, a rule is transformed into a curried partial function that is parameterized in an interpretation context . The curried parameters are Scala guarded pattern matching expression and correspond to the rule’s positive body literals, in order. For example, the rule on lines (11) and (12), with the condition ignored, for simplicity, is (roughly) translated into the function f

figure m

Notice the renaming of repeated occurrences of the variable, which is needed for the correct semantics. Notice also that a Scala Boolean-valued expression in an ordinary body literal position (e.g., ) simply becomes a guard in a pattern.

The code above can be understood with body matcher computation in mind. Suppose the inference engine selects an interpretation I from the current set of paths. For exhausting f on I, the inference engine combinatorially chooses literals \(l_1, l_2 \in I\) and collects the evaluation results of \(f(I)(l_1)(l_2)\), if defined. Observe that by the transformation into Scala pattern matching, body matchers are only implicitly computed by the Scala runtime system. Each evaluation result, hence, is a body-matcher instantiated head.

The rule’s negative body literal is translated into the code on line (3) and conjoined to the guard of the preceding ordinary literal. In general, a negative literal \( body \) is treated by translating \( body \) and evaluating the resulting Scala code on I by means of the method. If is not derivable then \( body \) is satisfied. Again, appropriate bindings for the variables bound outside of \( body \) are held implicitly by the Scala runtime system. The translation of the special forms and comprehension is not explained here for space reasons. Fusemate can show the generated code, though.

Properties and Advantages

The shallow embedding approach enables introspection capabilities and interfacing between the rule language and the host language beyond what is implemented in other systems. In Fusemate, the terms of the logical language are nothing but Scala objects. As a consequence, any available Scala type or library data structure can be used as a built-in without extending an “interface” to an extension language – simply because there is none. Dually, the embedding of the rule language into the host language Scala is equally trivial because rules, atoms and interpretations are Scala objects, too.

It is this “closed loop” that makes an aggregation operator () possible that returns a list of Scala objects as specified by the programmer, e.g., a list of terms or atoms.Footnote 4 This list can be further analysed or manipulated by the rules. See the description logic embedding in Section 5, which critically depends on this feature. This introspection capability stands out in comparison to the logic programming systems mentioned in the introduction. For instance, aggregation in systems like DLV [1], and IDB [12] is limited to predefined integer-valued aggregates for sum, count, times, max and min.

Most logic programming systems can be called from a (traditional) host programming language and can call external systems or utilize libraries for data structures. The DLV system, for instance, interfaces with C++ and Python [22], Prova [16] with Java, and IDP with the Lua scripting language. Systems based on grounding (e.g., DLV and IDP) face the problem of “value invention” by external calls, i.e., having to deal with terms that are not part of the input specification [10].

The main issue, however, from the Fusemate perspective is that these systems’ external interfaces are rather heavy-handed (boilerplate code, mapping logic terms to/from the host language, String representation of logic programs) and/or limited to a predefined set of data structures. In contrast, Fusemate’s seamless integration with Scala encourages a more integrated and experimental problem solving workflow. The following Scala program demonstrates this point with the traffic light example:

figure y

From a workflow perspective, this program integrates Fusemate as a list operator (on a list of instances) in an otherwise unremarkable functional program.

For a more realistically sized experiment I tried a combined Fusemate/Scala workflow for analysing the data of the DEBS 2015 Grand Challenge.Footnote 5 The data comprises two millions taxi rides in New York City in terms of start/end times, and start/end GPS coordinates, among others. The problem considered was to detect anomalies where a taxi driver drivers away from a busy hotspot without a passenger. Solving the problem required clustering locations by pickup/drop-off activity for determining hotspots, and then analysing driver behavior given their pickups/drop-offs at these hotspots.

Two million data points were too much for Fusemate alone and required Scala preprocessing, e.g., for filling a grid abstraction of New York coordinates, data cleansing and filtering out little active drivers. Fusemate was used for computing clusters with rules similar to transitive closure computation. Input to Fusemate calls were Scala precomputed point clouds. The computed clusters were used to analyze Scala prefiltered taxi rides for anomaly detection based on the clusters. This involved three moderately complex rules, for first identifying gaps and then analysing them. The comprehension operator was useful to find “the most recent ride predating a given start”, among others. The longest Fusemate run was 0.31sec for 64 rides (with 39 clusters fixed), most other runs took less than 0.15sec. Fusemate’s performance was perfectly acceptable in this experiment thanks to a combined workflow.

5 Embedding Description Logic \(\mathcal ALCIF\)

\(\mathcal ALCIF\) is the well-known description logic \(\mathcal ALC\) extended with inverse roles and functional roles. (See [3] for background on description logics.) This section describes how to translate an \(\mathcal ALCIF\) knowledge base to Fusemate rules and facts for satisfiability checking.

This is our example knowledge base, TBox on the left, ABox on the right:

figure aa

The \(\textsf {father}\) role is declared as functional, i.e., as a right-unique relation, and \(\textsf {father}^{-1}\) denotes its inverse “child” relation. The third GCI says that all children of a rich father are rich as well. In all models of the knowledge base \(\textsf {Fred}\) is \(\textsf {Poor}\). This follows from the given fact that his child \(\textsf {Anne}\) is poor, functionality of \(\textsf {father}\) and the third CGI. However, there are models where \(\textsf {Bob}\) is \(\textsf {Rich}\) and models where \(\textsf {Bob}\) is \(\textsf {Poor}\).

Translating description logic into rule-based languages has been done in many ways, see e.g. [11, 14, 17, 20]. An obvious starting point is taking the FOL version of a given knowledge base. Concept names become unary predicates, role names become binary predicates, and GCIs (general concept inclusions) are translated into implications. By polynomial transformations, the implications can be turned into clausal form (if-then rules over literals), except for existential quantification in a positive context, which causes unbounded Skolem terms in derivations when treated naively (for example, the third CGI above is problematic in this sense). This is why many systems and also the transformation to Fusemate below avoid Skolemization.

The first GCI corresponds to the clause \(\textsf {Person}(x) \rightarrow \textsf {Rich}(x) \vee \textsf {Poor}(x)\), and the second corresponds to the “almost” clause \(\textsf {Person}(x) \rightarrow \exists y . (\textsf {father}(x,y) \wedge \textsf {Person}(y))\). Fusemate works with the reified rule versions of these, with an \(\textsf {IsA}\)-predicate for concept instances, and a \(\textsf {HasA}\)-predicate for role instances. For the whole TBox one obtains the following, where RN stands for “role name” and CN stands for “concept name”.Footnote 6

figure ab

Every GCI can be converted into rules like the above without problems. For that, starting from its NNF, \(\exists \)-quantifications in the premise of a rule can be expanded in place, and \(\forall \)-quantifications can be moved to the head as the \(\exists \)-quantification of the NNF of the negated formula. Similarly for negated concept names. See [20] for such transformation methods. The ABox is represented similarly. Its first element, for instance, is .

In addition, some more general “library” rules for the tableau calculus are needed:

figure ad

The expansion rules on lines 1 and 2 deal with the \(\mathcal ALC \) binary Boolean connectives and in the obvious way. Supposing NNF of embedded formulas, no other cases can apply. The remaining rules can be understood best with the standard tableau algorithm for \(\mathcal ALCIN\) in mind, which includes blocking to guarantee termination. They follow the terminology in [6, Chapter 4]. The relation abstracts from the relation, left away for space reasons. The expansion rule for \(\exists \) comes for three cases. The first case (line 5), for example, applies to non-functional roles as per the Scala builtin test on line 6. The expansion of the given \(\exists \)-formula only happens if it is not yet satisfied and in a non-blocked situation (line 7). In this case the rule derives a Skolem object defined on line 8 for satisfying the \(\exists \)-formula. Notice the annotation which makes sure that the head is on the highest stratum. This way, the rule will be applied after, in particular, the rules for blocking. Furthermore, with the time stamp time +1 the Skolem object is kept separate from the computations in the current iteration time. The blocking rules are defined as follows:

figure aj

Some additional rules are needed for dealing with basic inconsistencies and for carrying over and facts between iterations. They are not shown here.

The expansion rules and blocking rules follow the tableau calculus description in [6, Chapter 4]. One important detail is that the expansion rule for \(\exists \) must be applied with lowest priority. This is straightforward thanks to Fusemate’s stratification and aggregation construct. Equally important is the access to (Scala) data structures via built-ins and using them as terms of the logical language. This made it easy to program Skolemization and the relation for collecting sets of concepts of an individual.

6 Conclusions

This paper described recent developments around the Fusemate logic programming system. It included new technical improvements for a weaker form of stratification, which enabled useful aggregation and comprehension language constructs. It also argued for the advantages of the tight integration with Fusemate’s host language, Scala, in terms of data structures and usability.

Answer set solvers like DLV and SModels are designed to solve NP-complete or higher complexity search problems as fast as possible. Fusemate is not motivated as a competitive such system, it is motivated for "well-behaved" knowledge representation applications, similarly to description logic reasoners, whose (often) NExpTime complete solving capabilities are not expected to be typically needed. (Some more work is needed, though, e.g., on improving the current term indexing techniques to speed up model computation.) More specifically, the main intended application of Fusemate is for the runtime analysis of systems that evolve over time. The taxi rides data experiment explained in Section 4 is an example for that. It suggests that Fusemate is currently best used in a combined problem solving workflow if scalability is an issue.

As for future work, the next steps are to make the description logic reasoner of Section 5 callable from within Fusemate rules in a DL-safe way [19] and to embed a temporal reasoning formalism. The event calculus [15] seems to be a good fit.