ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information and Computation
Volume 182, Issue 1, 10 April 2003, Pages 14-52
 
Font Size: Decrease Font Size  Increase Font Size
 Article - selected
PDF (415 K)
Thumbnails - selected | Full-Size Images

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
View Record in Scopus
 
doi:10.1016/S0890-5401(02)00048-2    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2003 Elsevier Science (USA). All rights reserved.

Pair-independence and freeness analysis through linear refinement

Giorgio LeviE-mail The Corresponding Author, a and Fausto SpotoCorresponding Author Contact Information, E-mail The Corresponding Author, b

a Dipartimento di Informatica, Via F. Buonarroti 2, 56127, Pisa, Italy b Dipartimento di Informatica, Strada Le Grazie, 15, 37134 Ca’ Vignal, Verona, Italy

Received 29 May 2001. 
Available online 25 February 2003.

Abstract

Linear refinement is a technique for systematically constructing more precise abstract domains for program analysis starting from the basic domain which represents just the property of interest. We use here linear refinement to construct a domain for pair-independence and freeness analysis of logic programs which is strictly more precise than Jacobs and Langen’s domain for sharing analysis endowed with freeness information. Moreover, it can be used for abstract compilation, while Jacobs and Langen’s domain can only be used for abstract interpretation. We provide an approximate representation of our domain and algorithms for the abstract operations. We describe an implementation of an analyser which uses abstract compilation over our domain and its evaluation over a set of benchmarks. This shows that its precision is comparable to that of a traditional sharing and freeness analysis performed through abstract interpretation. To the best of our knowledge, this is the first implementation of a sharing analysis based on abstract compilation, as well as the first implementation of a static analysis based on a new domain developed through linear refinement.

Article Outline

1. Introduction
2. Related work
3. Preliminaries
3.1. Terms, substitutions, and Herbrand constraints
3.2. The s-semantics
3.3. Abstract interpretation
3.4. Abstract compilation
3.5. Goal-independence
3.6. Linear refinement
4. Pair-independence analysis
5. Freeness analysis
6. Linear refinement revisited
7. Pair-independence and freeness
8. A representation
8.1. The data structure
8.2. The abstract operators
9. Abstraction
9.1. The abstraction of a single binding
9.2. The abstraction of a set of bindings
10. Implementation
10.1. Normalisation
10.2. Abstraction
10.3. Fixpoint computation
10.4. Reduction rules
10.5. Improving the efficiency
10.6. The result of the analysis
11. Experimental evaluation
12. Conclusions
Acknowledgements
Appendix A. Proofs of Section 4
Appendix B. Proofs of Section 8
Appendix C. Proofs of Section 9
Appendix D. Proofs of Section 10
References

1. Introduction

This paper is concerned with the systematic design, by means of linear refinement, of a new abstract domain for two important properties of logic programs, i.e., pair-independence and freeness. Pair-independence analysis [4 and 38] is concerned with determining at compile-time a superset of the set of pairs of variables which, in a given program point, can be bound at run-time to two terms which share some variable. It is a particular case of set-(in)dependence analysis, also called sharing analysis [6, 7, 28, 29, 30, 31 and 36]. In set-independence analysis, not only pairs but sets of variables are considered. (In)dependence analysis is useful for avoiding occur-check [38] and for automatic program parallelisation [28 and 36]. As stressed in [4], pair-(in)dependence information is actually needed in program analysis and transformation, and set-(in)dependence information is redundant w.r.t. pair-(in)dependence information.

Freeness analysis [6, 7, 10, 11, 27, 30 and 36] is concerned with determining at compile-time a subset of variables which are guaranteed to be bound at run-time to some variable in a given program point. Freeness analysis is useful for optimising unification, for goal reordering, for avoiding type checking and, again, in automatic program parallelisation. It is well known that performing sharing and freeness analysis in conjunction improves the precision of both [28 and 36].

Linear refinement [24] is a technique for systematically constructing abstract domains for program analysis. Given a basic abstract domain representing just the property of interest and a concrete operation (which, since we are considering logic programs, is usually unification) a new more accurate domain is constructed. The new domain leads to more precise abstract operations.

The first contribution of this paper is the definition, through abstract interpretation [16 and 17] and linear refinement, of a new domain for pair-independence and freeness. The use of linear refinement for the definition of our domain, differently from the Sharing×Free domain [28 and 36], leads to simple and general definitions and proofs. It is worth noting that the original correctness proof for the domain Sharing is very complex and uses a large part of Langen’s PhD thesis [31]. We also show, within the linear refinement framework, why and how independence information interacts with freeness information. An important feature of our domain is that it can be used for abstract compilation [22 and 26], which is an application of abstract interpretation where, rather than computing the abstract denotation of a program by executing its concrete code over abstract data, the code itself is abstracted and replaced by abstract code, where concrete data structures are replaced by their abstraction. As a consequence, the computation of the abstract denotation can be achieved by the same algorithm used in the concrete computation.

The second contribution is the design of a computationally feasible representation of our domain, together with algorithms for computing an approximation of the concrete operations. Since this approximation can reduce the theoretical precision of our domain, we describe a prototypical analyser for pair-independence and freeness, based on abstract compilation and a fixpoint semantics. The use of a fixpoint semantics results in a goal-independent analysis. This means that the program is analysed for the most general goals only. More instantiated goals are analysed by using the analysis of the most general goals. We evaluate our analyser over a set of benchmarks. Although it is just a prototype, our evaluation shows that it is efficient enough for practical use on small benchmarks. Its precision is shown to be comparable to that of a traditional goal-dependent analysis.

To the best of our knowledge, this is the first implementation of a goal-independent sharing analysis based on abstract compilation, as well as the first implementation of a static analysis based on a new domain developed through linear refinement.

This paper is organised as follows. Section 2 discusses related works. Section 3 introduces preliminary definitions. 4 and 5 introduce two basic domains for pair-independence and freeness analysis, respectively, and show that their linear refinement does not lead to useful domains. In Section 6 we justify this result and in Section 7 we show why it is useful to combine the two analyses by using a domain which is defined as the linear refinement of the reduced product of the basic domains for pair-independence and for freeness. This domain is shown to be more precise than the domain of [28 and 36]. Section 8 defines a data structure which can be used as an approximate representation for our new domain, together with algorithms for the abstract operations. Section 9 shows an algorithm for computing the abstraction map. Section 10 describes the implementation of a prototypical analyser, and Section 11 reports its evaluation over a set of benchmarks. Finally, Section 12 draws some conclusions. Most of the proofs are kept in a separate appendix, for the convenience of the reader.

Preliminary and partial versions of this paper appeared in [1 and 33].

2. Related work

Almost all the domains developed for sharing analysis are not amenable to abstract compilation [6, 28, 29, 30, 31 and 36]. Moreover, they have been developed without using any systematic technique like linear refinement.

To the best of our knowledge, only [7 and 13] provide abstract domains for sharing analysis which can be used for abstract compilation. The domain in [13] is isomorphic to the Sharing domain of [28 and 31]. This means that, when used for abstract compilation, in order to obtain a useful precision, it must be coupled with a domain expressing further information, like freeness or linearity. This domain must in turn be amenable to abstract compilation. We do not know of any prototypical analyser implemented through their domain. The domain in [7] models sharing, freeness and groundness, but it is not developed through abstract interpretation. Instead, it uses pre-interpretations.

In the context of logic languages, linear refinement has been already used for reconstructing the domain Pos for groundness analysis [37]. Moreover, it has been used to develop new domains for type [32] and freeness analysis [27].

3. Preliminaries

3.1. Terms, substitutions, and Herbrand constraints

We denote by Weierstrass p(S) the powerset of a set S, by #S its cardinality and by Weierstrass pf(S) the set of all subsets of S of finite cardinality.

In this paper, we assume that Image is an infinite set of variables, Image and Σ is a set of function symbols with associated arity, containing at least a symbol of arity 0. We define terms(Σ,V) as the minimal set of terms built from V and Σ as: Vsubset of or equal toterms(Σ,V) and if t1,…,tnset membership, variantterms(Σ,V) and Image has arity ngreater-or-equal, slanted0, then Image . Let tset membership, variantterms(Σ,V). By vars(t) we denote the set of variables which occur in t. If vars(t)=empty set, then t is ground. It is linear if every vset membership, variantV occurs at most once in t. If Image and then Vunion or logical sumx means Vunion or logical sum{x} and V-45 degree rulex means V-45 degree rule{x}. Syntactical substitution in t of x with tset membership, variantterms(Σ,V) is denoted by t[x maps to t].

A substitution θ is a map from variables to terms. Its domain is denoted by dom(θ) and the set of variables in its range by rng(θ). The set of idempotent substitutions θ such that dom(θ)union or logical sumrng(θ)subset of or equal toV and dom(θ)∩rng(θ)=empty set is denoted by ΘV. We write θset membership, variantΘV extensionally as θ={v1 maps to t1,…,vn maps to tn}, meaning that dom(θ)={v1,…,vn} and θ(vi)=ti for every i=1,…,n. Let θset membership, variantΘV and Rsubset of or equal toV. We define θ|R(x)=θ(x) if xset membership, variantR and θ|R(x)=x if xset membership, variantV-45 degree ruleR. If tset membership, variantterms(Σ,V) then tθset membership, variantterms(Σ,V) is the term obtained by replacing every variable x in t by θ(x). Composition of substitutions θ,σset membership, variantΘV is defined as (θσ)(x)=θ(x)σ for every xset membership, variantV. We recall that it is associative, the empty substitution var epsilon is the neutral element and, for each term t, we have t(θσ)=(tθ)σ.

The set CV of finite sets of Herbrand equations is

CV=Weierstrass pf({t1=t2 | t1,t2set membership, variantterms(Σ,V)}).
Every substitution can be seen as a set of Herbrand equations. The embedding map is Eq(θ)={v=θ(v) | vset membership, variantdom(θ)}. We hence assume that ΘVsubset of or equal toCV. Let cset membership, variantCV. We say that cθ is true if t1θ is syntactically equal to t2θ for every (t1=t2)set membership, variantc. We know [35] that if there exists θset membership, variantΘV such that cθ is true, then c can be put in the normal form mgu(c)set membership, variantΘV which is such that cθ is true if and only if mgu(c)θ is true. If no θset membership, variantΘV exists such that cθ is true, then mgu(c) is undefined. Note that cset membership, variantCV in normal form can be seen as a substitution, and hence the notations c(x) and tc are defined.

Let Image be an infinite set of variables disjoint from Image . We define the set

Image
of existential Herbrand constraints. Here, Image are called the program variables and Image the existential variables. Existential variables are the unnamed variables of Prolog. For instance, the most general solution of the following Prolog clause:

Image
is the existential Herbrand constraint Image .

We define

solV(there existsWc)={θ|V | θset membership, variantΘVunion or logical sumW, rng(θ)subset of or equal toV and cθ istrue}.
Hence solV(there existsWc)=solV(there existsWmgu(c)). For instance, if V={X,Z}, then .

A constraint there existsWc is in normal form if c is in normal form. It is consistent if solV(there existsWc)≠empty set. Two constraints h1,h2set membership, variantHV are equivalent if solV(h1)=solV(h2). For instance, the constraints Image and Image are equivalent. In the following, a constraint will stand for its equivalence class. Since, as shown above, every consistent existential Herbrand constraint has an equivalent normal form, in the following we will consider only normal existential Herbrand constraints.

3.2. The s-semantics

We use HV as the computational domain of programs. Since we will later define abstractions of HV (Section 8), we decorate the following definitions with HV. Once an abstraction will be defined, we just substitute it instead of HV.

Definition 1.  Let Π be a finite set of predicate symbols with associated arity. A logic program over H is a finite set of clauses

(1)
Image
where Image with ngreater-or-equal, slanted0, {X1,…,Xn}subset of or equal toV are distinct and for every i=1,…,m we have Giset membership, variantHV or Image with Image and {Y1,…,Yl}subset of or equal toV distinct. The left-hand side of (1) is the head of the clause, the right-hand side is its tail. We say that the clause (1) defines the predicate p. Every predicate must be defined by at least one clause of P. If more clauses of P define the same predicate, they must use the same variables X1,…,Xn in (1).

The s-semantics of logic programs [5] is based on a fixpoint definition over interpretations. Interpretations work over the collecting version [17] of HV, i.e., over the lattice left angle bracketWeierstrass p(HV),∩,union or logical sum,HV,empty setright-pointing angle bracket.

Definition 2.  An interpretation over H is a function I which maps every Image to Weierstrass p(H{X1,…,Xn}), where {X1,…,Xn} are the variables in the head of the clauses which define p (Definition 1). The set of interpretations over H is denoted by Image .

Four operations over HV, called conjunction, restriction, expansion, and renaming, respectively, are used to define the s-semantics. They are defined in Definition 3. The operation star, filledHV computes the conjunction of two constraints through the normalisation procedure. The restrict and expand operations remove a variable from and add a variable to a constraint, respectively. Note that expand is not the identity function but an embedding, as its signature shows. The operation rename gives a new name to a variable.

Definition 3.  We define

star, filledHV:HV×HV maps to HV,


restrictxHV:HV maps to HV-45 degree rulex with xset membership, variantV,


expandxHV:HV maps to HVunion or logical sumx with xnegated set membershipV,


renamexnHV:HV maps to H(V-45 degree rulex)union or logical sumn with xset membership, variantV and nnegated set membershipV
as1

Image


Image


expandxHV(there existsWc)=there existsWc,


renamexnHV(there existsWc)=there existsW(c[x maps to n]).

The operations of Definition 3 are pointwise extended to Weierstrass p(HV). For instance, if S1,S2subset of or equal toHV, then S1star, filledHVS2={h1star, filledHVh2 | h1set membership, variantS1, h2set membership, variantS2 and h1star, filledHVh2 isdefined} and there existsWeierstrass p(HV)xS={there existsHVxh | hset membership, variantS}. On the collecting domain Weierstrass p(HV) a new operation union or logical sumHV is defined as union or logical sumHV(S1,S2)=S1union or logical sumS2.

We abuse notation and we use the operations of Definition 3 with sets of variables instead of single variables. For instance, restrict{x1,…,xn}HV stands for the composition restrictHVx1cdots, three dots, centeredrestrictHVxn and expandHVx1,…,xmn1,…,nm for the composition expandHVx1n1cdots, three dots, centeredexpandHVxmnm.

The s-semantics of a program is the least fixpoint of its immediate consequence operator.

Definition 4.  Let P be a program over H. Its immediate consequence operator Image is such that

Image
for every Image , where the denotation [[G]]I of G in I is

[[G1,…,Gm]]I=[[G1]]Istar, filledHVcdots, three dots, centeredstar, filledHV[[Gm]]I


[[h]]I={h} if hset membership, variantHV,


Image
if Image .

As one can see from Definition 4, the denotation of the tail of a clause is computed by using the conjunction operator star, filledHV applied to the denotations of the components of the tail. The operator TP then projects (restrictHV) this denotation over the variables that occur in the head of the clause. The denotation of a predicate q in the tail of a clause is computed by fetching its current interpretation, by renaming (renameHV) its variables in order to reflect its calling context and by enlarging (expandHV) the set of variables in order to cover the entire set V.

3.3. Abstract interpretation

Abstract interpretation [16 and 17] allows us to reason about the abstraction relation between two different domains (the concrete and the abstract domain).

We recall that a complete lattice L is a partially ordered set where least upper bound (or join, denoted by square cup) and greatest lower bound (or meet, denoted by square intersection) exist for every subset of L. A Moore family M of C is a topped completely meet-closed subset of C, i.e., M contains the top element of C and is closed w.r.t. arbitrary meets. The Moore (square intersectionC) closure of a set Asubset of or equal toC is denoted by c(A).

Definition 5.  Let left angle bracketC,less-than-or-equals, slantright-pointing angle bracket and left angle bracketA,not precedes, equalsright-pointing angle bracket be two complete lattices (the concrete and the abstract domain). A Galois connection from C to A is a pair of monotonic maps α:CA (abstraction) and γ:AC (concretisation) such that for each xset membership, variantC we have xless-than-or-equals, slantγα(x) and for each yset membership, variantA we have αγ(y)not precedes, equalsy. A Galois insertion is a Galois connection where αγ is the identity map on A.

The composition of Galois connections is a Galois connection. The composition of Galois insertions is a Galois insertion. A Galois connection is a Galois insertion if and only if γ is one-to-one or, equivalently, if and only if α is onto. In a Galois insertion, the abstraction map uniquely identifies the concretisation map and vice versa. It is well known [16] that the set of Galois insertions from C to A is isomorphic to the set of the Moore families of C. This means that every Moore family Msubset of or equal toC is an abstract domain whose concretisation map is the identity map. This way of looking at abstract domains allows us to distinguish the property of a domain from the properties of its representations.

Let f:CnC be a concrete operator and let Image . Then Image is a correct approximation of f if for all y1,…,ynset membership, variantA we have

. For each operator f, there exists a best correct or optimal approximation Image defined as . The composition of correct approximations is a correct approximation but the composition of optimal approximations is not necessarily an optimal approximation. When f is clear from the context, we just say that Image is correct (optimal). The abstract domain A is called condensing w.r.t. Image if for every x,yset membership, variantA we have Image [31 and 34].

Every abstract domain A, with abstraction function αA, allows us to compute the corresponding abstract s-semantics of a logic program, by substituting A instead of H in Definitions 1–4Definitions 1–4Definitions 1–4Definitions 1–4. The denotation of a Herbrand constraint becomes its abstraction. Hence, we modify Definition 4 with [[h]]IA(h). The precision of the abstract semantics (analysis) depends on the precision of the abstract domain.

3.4. Abstract compilation

As we said at the end of the previous section, we can compute the abstract s-semantics of a program by using its same definition instantiated over the abstract domain A. However, this requires to abstract the concrete constraints in the program at every iteration of the immediate consequence operator (Definition 4). It is hence natural to optimise the fixpoint computation by abstracting the logic program once and for all into an abstract logic program, and by then computing its s-semantics without using the abstraction function anymore. In such a case, Definition 4 can be instantiated to the abstract domain A without the modification described at the end of the previous section. This technique is called abstract compilation [12 and 26].

Example 6.  The computation over A of the abstract s-semantics of the following logic program:

proceeds as follows. We first substitute the concrete constraints with their abstraction. Let Image and Image . The compiled program is

We then compute the fixpoint of the TPA operator (Definition 4).

Note that abstract compilation can be used only if all the abstract operations are defined over elements of the abstract domain only, which is what we assume when we instantiate Definition 3 over the abstract domain A. Instead, the conjunction operation of the domain Sharing of [31] is defined between a concrete element and an abstract one. This does not allow us to use abstract compilation for that domain. Actually, even Definition 4 must be modified to fit that domain.

3.5. Goal-independence

By goal-independence we mean that the (abstract) semantics of a program is computed for the most general goals only. The semantics of the other goals is derived from that of the most general goals by instantiation and without using the text of the program. Hence the semantics of the most general goals must contain all the information needed to derive the semantics of the other goals.

The advantage of goal-independence is that the analysis becomes naturally modular, since it cannot look at the text of other modules, but only at the summary information gained from them. Another advantage of goal-independence is that, once a module has been analysed, its source code can be kept secret. Thus the analysis can be applied also when the code cannot be publicly divulged for copyright reasons.

A typical example of a goal-independent analysis is that obtained through the computation of an abstract s-semantics (Section 3.2). Since only the most general goals are analysed, the analysis of a goal like Image is derived from the analysis of the most general goal Image through its conjunction with the abstraction of Image .

In has been shown that, in general, a goal-independent analysis is less precise than a goal-dependent analysis computed by using the same abstract domain, and that, for domains which are condensing w.r.t. conjunction, both analyses have the same precision [25].

Note that our notion of goal-independence is different from that used in [9 and 18], where the program is still needed to derive the goal-dependent information from the (so-called) goal-independent analysis of the same program. Hence, our notion is in our opinion more correct.

3.6. Linear refinement

Given an abstract domain Asubset of or equal toC, a domain refinement operator R yields an abstract domain R(A)subset of or equal toC which is more precise than A, i.e., which contains A [19 and 23]. A classical domain refinement operator is the reduced product Asquare intersectionB of two domains A and B, both contained in another domain C [16]. It is isomorphic to the Cartesian product of A and B, modulo the equivalence relation left angle bracketa1,b1right-pointing angle bracketleft angle bracketa2,b2right-pointing angle bracket if and only if a1square intersectionb1=a2square intersectionb2. Hence pairs with the same meaning are identified.

Linear refinement [24] is a slight generalisation of Cousot’s reduced power operation [16]. It allows us to include in a domain the information related to the propagation of the abstract property of interest before and after the application of a partial operator over C. We consider here just the case when C=Weierstrass p(HV) and the operator is the pointwise extension of conjunction (Definition 3).

Let a,bset membership, variantWeierstrass p(HV). We define the linear refinement of a w.r.t. b as

(2)
ab={hset membership, variantHV | if astar, filledHVh isdefinedthen astar, filledHVhless-than-or-equals, slantb}.
The set ab contains exactly those existential Herbrand constraints which, upon conjunction with a constraint in a, become a constraint in b. If a and b are sets of constraints satisfying some property, you can view ab as the set of constraints which transform the property a into the property b upon conjunction. An arrow ab is called tautological when it coincides with HV.

Example 7.  For every vset membership, variantV, let v={there existsWcset membership, variantHV | vars(c(v))=empty set}. The set v is the set of constraints which bind v to a ground term. Let x,yset membership, variantV. Eq. (2) becomes in this case

Image
This means that every hset membership, variantxy is such that in all its instantiations if x is ground then y is ground. Equivalently, you can say that h transforms the groundness of x into the groundness of y upon conjunction. For instance, we have Image and Image . But Image . Note that, since groundness cannot be lost, xx=HV. Hence xx is a tautological arrow.

Given an abstract domain Lsubset of or equal toHV, we define

(3)
Lright triangle, openL=c{ab | a,bset membership, variantL}.
The set Lright triangle, openL is then the collection of all possible intersections of arrows which can be built from elements of L. Note that l→(asquare intersectionb)=(la)square intersection(lb).

The linear refinement LL of L is the domain

(4)
LL=Lsquare intersection(Lright triangle, openL).
If Lsubset of or equal toLright triangle, openL, i.e., if the properties in L are (degenerate) cases of intersections of arrows, (4) can be simplified into

(5)
LL=Lright triangle, openL.
This simplification is relevant since it allows a simpler representation and simpler operations for LL. Indeed, we need to represent elements and operations over L