Copyright © 2003 Elsevier Science (USA). All rights reserved.
Term rewriting for normalization by evaluation
Ulrich Berger, Matthias Eberl and Helmut Schwichtenberg
, 
Received 2 November 1999.
Abstract
We extend normalization by evaluation (first presented in [5]) from the pure typed λ-calculus to general higher type term rewriting systems and prove its correctness w.r.t. a domain-theoretic model. We distinguish between computational rules and proper rewrite rules. The former is a rather restricted class of rules, which, however, allows for a more efficient implementation.
Article Outline
- 1. Introduction
- 2. A simply typed λ-calculus with constants
- 2.1. Types, terms, rewrite rules
- 2.2. Computation rules
- 2.3. Examples
- 2.4. Normalizable terms and their normal forms
- 2.5. Term families
- 3. Normalization by evaluation
- 3.1. Domain theoretic semantics of simply typed λ-calculi
- 3.2. Interpretation of the types
- 3.3. Reification and reflection
- 3.4. Predecessor functions
- 3.5. Interpretation of the constants
- 3.6. Correctness of normalization by evaluation
- Acknowledgements
- References
1. Introduction
It is well known that implementing normalization of λ-terms in the usual recursive fashion is quite inefficient. However, it is possible to compute the long normal form of a λ-term by evaluating it in an appropriate model (cf. [5]). When using for that purpose the built-in evaluation mechanism of e.g., S
(a pure L
dialect) one obtains an amazingly fast algorithm called “normalization by evaluation” or NbE for short. In the context of type-directed partial evaluation [8] it has been analyzed in what sense NbE is more efficient, and why: a punctual comparison between NbE and a naive, symbolic normalizer can be found in [4]. The essential idea is to find an inverse to evaluation, converting a semantic object into a syntactic term. This normalization procedure is used and tested in the proof system M
developed in Munich (cf. [2]). Notice, however, that once NbE is expressed in a functional programming language, the evaluation order of this language (call-by-value for S
) determines the reduction order of NbE (applicative order for a call-by-value language). It is thus easy to defeat NbE in S
by normalizing the application of a nonstrict function to an expression that is expensive to normalize. For such a term, a symbolic normalizer following a normal order reduction strategy can easily be more efficient.
Obviously, for applications pure typed λ-terms are not sufficient; one clearly needs constants as well. In [4] NbE has been extended to term systems with higher order term rewrite rules. The present paper adds a distinction between what we call computational rules and (proper) rewrite rules; NbE seems to be much more efficient for the former than for the latter. In our implementation (in the M
system) we therefore use computational rules whenever possible.
A related approach (using a glueing construction) is elaborated by Coquand and Dybjer in [6]. Another related paper is Altenkirch et al. [1]; there a cartesian closed category is defined which has the property that the interpretation of the simply typed lambda calculus in it yields the reduction-free normalization algorithm from [5], as well as its correctness. Moreover, Danvy (cf. e.g., [8]) has successfully used this algorithm (or more precisely its call-by-value counterpart) in the context of partial evaluation. Filinski [10] also treats NbE for an extension of the λ-calculus by constants, where nontermination is allowed. However, he does not consider constants whose meaning is only given operationally, i.e., by arbitrary rewrite rules. Therefore the normal proof technique employing the logical relation “the value of expression e in environment δ is a” is available in his case, whereas in ours it is more convenient to follow a different approach, via an appropriate inductive generation of the reducibility relation.
Why should one be interested in the correctness of NbE for general rewrite rules, where neither termination nor even confluence is assumed? One reason is that in an interactive proof development system (M
in our case) it is convenient not having to deal explicity with equality axioms, but rather to identify terms with the same normal form, modulo a given set of rewrite rules. Then an efficient normalization algorithm such as NbE to test for equality clearly is useful. However, one does not want to have the obligation to prove termination and confluence of the whole set of rewrite rules whenever a new one is added.
The aim of the present paper is to develop the theory of normalization by evaluation from scratch, up to and including (some generalizations of) Gödel’s system T of higher order primitive recursion. In fact, we will treat almost arbitrary rewrite systems.
Let us begin with a short explanation of the essence of the method for normalizing typed λ-terms by means of an evaluation procedure of some functional programming language such as S
. For simplicity we return to the simplest case, simply typed λ-calculus without constants.
Simple types are built from ground types τ by ρ → σ (later also products ρ×σ will be included). The set Λ of terms is given by xσ,(λxρMσ)ρ→σ,(Mρ→σNρ)σ; let Λρ denote the set of all terms of type ρ. The set L
of terms in long normal form (i.e., normal w.r.t. β-reduction and η-expansion) is defined inductively by (xM1…Mn)τ,λxM (we abbreviate xM1…Mn by xM and similar a list M1…Mn by M). By
(M) we denote the long normal form of M, i.e., the unique term in long normal form βη-equal to M.
Now we have to choose our model. A simple solution is to take terms of ground type as ground type objects and all functions as possible function type objects:
τ
Λτ,
ρ → σ

σ
ρ
(thefullfunctionspace).
Mρ
↑
ρ
denotes the value of M under the assignment ↑. Two such functions ↓ and ↑ can be defined simultaneously, by induction on the type. It is convenient to define ↑ on all terms (not just on variables). Hence for every type ρ we define ↓ρ:
ρ
→ Λρ and ↑ρ:Λρ →
ρ
(called reify and reflect) by
τ
to be the set of families of terms of type τ (instead of single terms) and setting ↓ρ→σ(a)(k)
λxk(↓σ(a(↑ρ(xk∞)))(k+1)), where xk∞ is the constant family xk. The definition of ↑ρ→σ has to be modified accordingly. This idea corresponds to a representation of terms in the style of de Bruijn [9]. An advantage of this approach is that the NbE program is purely functional and hence can be verified relatively easily. If side effects were involved the verification would be much more complicated.The proof of correctness is easy (ignoring the problem with the “new variable”): Since for the typed lambda calculus without constants we have preservation of values; i.e.,
for all terms M and environments ξ, we only have to verify ↓(
N
↑)=N for terms N in long normal form, which is straightforward, by induction on N:
Case xρ→τNρ (w.l.o.g.)
xN
↑)=↑ρ→τ(x)(
N
↑)=↑τ(x↓ρ(
N
↑))=xN.Case λyN
↓ρ→σ(
λyN
↑)=λx↓σ(
λyN
↑(↑ρ(x))) x new=λx↓σ(
Ny[x]
↑)=λxNy[x] byIH
=αλyN.
The structure of the paper is as follows. In Section 2 we present the simply typed λ-calculus with constants and pairing and give some examples of higher order rewrite systems. We also introduce the distinction between computational and (proper) rewrite rules. Then we inductively define a relation M → Q, with the intended meaning that M is normalizable with long normal form Q, and prove in Section 3.6 the correctness of normalization by evaluation by showing that M → Q (essentially) implies ↓(
M
↑)=Q. Hence the mapping M
↓(
M
↑) is a normalization function. In order to define the semantics
M
of a term M properly we use domain theory. This is described briefly in Section 3.1.
Note that we prove correctness of NbE w.r.t. a denotational semantics, but do not attempt to prove operational correctness, i.e., the fact that the functional program formalizing NbE when called with a term M such that M → Q will terminate with Q as output. In order to obtain operational correctness from denotational correctness one needs a suitable adequacy result à la Plotkin [13] relating the denotational and the operational semantics. Plotkin’s result cannot be applied here because it refers to a call-by-name operational semantics, whereas we are interested in a call-by-value semantics in order to obtain a correctness result for our implementation of NbE in the call-by-value language S
. Furthermore Plotkin only considers the integers and the booleans as base types, whereas we need complex recursively defined types as base types (see Section 3.2). We leave the problem of proving adequacy of our denotational semantics for a fragment of a call-by-value language suitable for formalizing our extension of NbE to future work.
2. A simply typed λ-calculus with constants
2.1. Types, terms, rewrite rules
We start from a given set of ground types. Types are inductively generated from ground types τ by ρ → σ and ρ×σ. Terms are
xρ typedvariables,
cρ constants,
(λxρMσ)ρ→σ abstractions,
(Mρ→σNρ)σ applications,
M0ρ,M1σ
ρ×σ pairing,π0(Mρ×σ)ρ, π1(Mρ×σ)σ projections.
Ground types will always be denoted by τ. We sometimes write M0 for π0(M) and M1 for π1(M). Two terms M and N are called α-equal—written M=αN—if they are equal up to renaming of bound variables. Λρ denotes the set of all terms of type ρ (α-equal terms are not identified). MN denotes (…(MN1)N2…)Nn, where some of the Ni’s may be 0 or 1. By
(M) we denote the list of variables occurring free in M. By Mx[N] we mean substitution of every free occurrence of x in M by N, renaming bound variables if necessary. Similarly Mx[N] denotes simultaneous substitution. λxM abbreviates λx1…λxnM. If MN is of type σ,Ni of type ρi, then we call ρ → σ a type information for M. Here ρ is a list of types, 0’s or 1’s indicating the left or right part of a product type. So, e.g., a term M of type ρ=(τ → τ → τ)×(τ → (τ×τ)) has (0,τ) → (τ → τ) or (1,τ,0) → τ as a type information. If there are no product types ρ → σ simply abbreviates (ρ1 → (ρ2
→ (ρn → σ)
)).
For the constants cρ we assume that some rewrite rules of the form cK
N are given, where
and cK, N have the same type (not necessarily a ground type). Moreover, for any type information ρ1,…,ρn → τ for c (τ a ground type), we require that there is a fixed length k
n of arguments for the rewrite rules, i.e., cM
N implies that M has length k, provided the projection markers in M and in ρ1,…,ρk coincide. If no rewrite rate of the form cM
N (1
length of M
n) applies, then this fixed length is stipulated to be n. We write cρ→σ to indicate that we only consider c with argument lists K with these projection markers; the notation cMN is used to indicate that M are the fixed arguments for the rewrite rules of c. In particular if there is no rewrite rule for c, then N is empty and cM is of ground type.
For example, if c is of type (τ → τ → τ)×(τ → τ), then the rules c0xx
a and c1
b are admitted, and c0,τ,τ→τ indicates that we only consider argument lists of the form 0, x, y.
2.2. Computation rules
Given a set of rewrite rules, we want to treat some rules—which we call computation rules—in a different, more efficient way. The idea is that a computation rule can be understood as a description of a computation in a suitable semantical model, provided the syntactic constructors correspond to semantic ones in the model, whereas the other rules describe syntactic transformations.
A constant c is called a constructor if there is no rule of the form cK
N. For instance in the examples of Section 2.3 the constants 0,
, and
+ are constructors. Constructor patterns are special terms defined inductively as follows.
• Every variable is a constructor pattern.
• If c is a constructor and P1,…,Pn are constructor patterns or projection markers 0 or 1, such that cP is of ground type, then cP is a constructor pattern.
From the given set of rewrite rules we choose a subset C
with the following properties.
• If cP
Q
C, then P1,…,Pn are constructor patterns or projection markers.
• The rules are left-linear, i.e., if cP
Q
C, then every variable in cP occurs only once in cP.
• The rules are nonoverlapping, i.e., for different rules cK
M and cL
N in Cthe left-hand sides cK and cL are nonunifiable.
We write
to indicate that the rule is in C
. The set of constructors appearing in the constructor patterns is denoted by C
. All other rules will be called (proper) rewrite rules, written
.
In our reduction strategy below computation rules will always be applied first, and since they are nonoverlapping, this part of the reduction is unique. However, since we allowed almost arbitrary rewrite rules, it may happen that in case no computation rule applies a term may be rewritten by different rules
C
. In order to obtain a deterministic procedure we assume that for every constant cρ→σ we are given a function
computing from M either a rule
, in which case M is an instance of K, i.e., M=Kx[L], or else the message “
”, in which case M does not match any rewrite rule: i.e., there is no rule
such that M is an instance of K. Clearly
should be compatible with α-equality and should satisfy an obvious uniformity property; i.e., whenever M and M′ are variants (i.e., can be obtained from each other by an invertible substitution), then
.
Often the rewrite rules will be left-linear (i.e., no variable occurs twice in the left-hand side of a rule); then it is reasonable to require that every select function
is strongly uniform in the sense that for all instances (with not necessarily distinct variables z) we have
.
2.3. Examples
(a) Usually we have the ground type ι of natural numbers available, with constructors
and recursion operators Rρι→ρ→(ι→ρ→ρ)→ρ. The rewrite rules for R are
λyz.y,
y, (b) We can also deal with infinitely branching trees such as the Brouwer ordinals of type
. There are constructors
and
and recursion constants
(c) It is well known that by the Curry–Howard correspondence natural deduction proofs can be written as λ-terms with formulas as types. To use normalization by evaluation for normalizing proofs we may also introduce a ground type
with constructors and destructors
− is
−(
+x0x1)
λy.yx0x1.
+:
x(A →
x A),
+ρ0,ρ1 and
−ρ0,ρ1,σ by the terms λx0λx1(x0,x1) and λzλf(fπ0(z)π1(z)), respectively. However, the latter term does not correspond to a derivation in first order logic, since it is impossible to pass from an arbitrary derivation d (possibly with free assumptions) of
xA to a term π0(d) and a derivation π1(d) of Ax[π0(d)].One can easily formulate rules for permutative conversions, which permute an application of an
-elimination rule with other elimination rules, e.g.,
−ρ0,ρ1,σ0→σ1p
λzv.
−ρ0,ρ1,σ1p(λxy.(zxyv)).2.4. Normalizable terms and their normal forms
We inductively define a relation M → Q for terms M,Q. The intended meaning of M → Q is that M is normalizable with (long) normal form Q. However, it is necessary to split up → into two relations: a “weak” one →w intended to unwrap the outer constructor form, followed by a “strong” one →s, where we assume that it is applied to terms M irreducible w.r.t. →w.
Looking at the form of a term we will embark on the following strategy:
• β-redexes (λxM)N and computation rules cMN are reduced promptly; i.e., we use call-by-name here.
• If no rule applies to cMN one first tries to find out whether M can be reduced to P such that cP matches a computation rule. This does not require reducing each Mi, to normal form; it suffices to find out the outer pattern of Mi (let us call it for now “constructor normal form”). The reductions for doing so will be called “weak” and we write →w for them.
• If in cMN all M are already in constructor normal form and no computation rule applies, then in a second step one reduces all M and N to normal form (if it exists) and tries to apply a proper rewrite rule, i.e., we use call-by-value at this point.
Let M → M′ abbreviate M1 → M1′,…,Mn → M′n and similarly for other relations, and let →w* be the reflexive and transitive closure of →w.
Definition 1. S
.
M0,M1
iP→wMiP for i
{0,1}.
comp Q.For readability we will often write R
in the following form, assuming that
is the selected rule.
R
.
For the definition above to make sense we prove the following.
Lemma 2. If M →sM′ and M′ is an instance of a constructor pattern P, then also M is an instance of P.
Proof. By induction on P. If P is a variable the claim is trivial, so let P=cP. Then M′=cK′ and K′ is an instance of P. Moreover, the only possibility to infer M →sM′=cK′ is by P
A
. Thus M=cK,K →sK′ and by induction hypothesis (IH) K is an instance of P. Since P is linear we eventually get that cK is an instance of cP. □
Definition 3. The set L
of terms in long normal form is defined as follows. λxM,
M,N
,(xM)τ, and (cMN)τ are in L
if M, N, M, N are, provided that cM is not an instance of any computation or rewrite rule. For example, the η-expansion
(x) of a variable x is in long normal form; it is defined using induction on types by (e.g., for pure → -types)
(xτ)=xτ,
.
Lemma 4. If M → Q or M →s Q, then Q is in long normal form.
Proof. By simultaneous induction on M → Q and M →s Q. The only interesting case is P
A
, where we have to show that cM′ is not an instance of a computation rule. But if cM′ would be such an instance, by the previous lemma cM would also be, contradicting the assumption. □
Furthermore it can be shown easily that if M → Q, M →w Q, or M →s Q, then M reduces to Q in the usual sense w.r.t. β-reduction, η-expansion, and the computation and rewrite rules for the constants. However, the converse is not true in general. For a counterexample, consider the nonterminating rewrite rules
and
. Then 0 is a normal form of
0, but we cannot have
for any Q. To see this, note that we cannot have
→sN for any N (since
; hence we also cannot have
0 →s Q for any Q. Since
, 0 are →w*-reducible only to themselves, the claim follows. But under the hypothesis that M is strongly normalizable the converse is true.
Lemma 5. If M is strongly normalizable w.r.t. these reductions (i.e., every reduction sequence terminates), then M → Q for some Q.
Proof. For simplicity we consider pure →-types only; the extension to product types is immediate. We will prove the claim by induction on hM and side induction on
(M), where hM denotes the height of the reduction tree for M and
(M) is the height of M. Note that if M →w Q then M reduces to Q in at least one step; hence hM>hQ. Case λyM. We have (λyM)y →wM → Q by B
and the side induction hypothesis (SIH); hence λyM → λyQ by E
. Case M has a type ρ → σ, but is not an abstraction; Then M η-expands to λy. My where y is a new variable of type ρ; hence hM>hλy.My
hMy. Therefore My → Q by IH. Hence M → λyQ by E
. It remains to consider terms of ground type. Case xM. Obvious, using the SIH and rule V
A
. Case (λxM)NP. Then (λxM)NP →wMx[N]P → Q by B
and the IH. Case cPx[L]N with
. Then cPx[L]N →wQx[L]N → Q1 by C
and the IH. Case cMN with cM not an instance of a computation rule. By SIH M → M′. If at least one Mi is →w-reduced, the claim follows from the IH and A
. Otherwise we have M →sM′. Now if
and M′=Kx[L], the claim follows from the IH for Qx[L]N. If, however,
=
, then proceed as in case xM, using P
A
instead of V
A
. □
Moreover, the relation M → Q clearly is not closed under substitution. However, it is closed under substitution of variables, provided the result is a variant of M.
Lemma 6. Let







E-mail Article
Add to my Quick Links

Cited By in Scopus (6)


