skip to main content
research-article
Open Access

Algebra-Based Reasoning for Loop Synthesis

Published:21 July 2022Publication History

Skip Abstract Section

Abstract

Provably correct software is one of the key challenges of our software-driven society. Program synthesis—the task of constructing a program satisfying a given specification—is one strategy for achieving this. The result of this task is then a program that is correct by design. As in the domain of program verification, handling loops is one of the main ingredients to a successful synthesis procedure.

We present an algorithm for synthesizing loops satisfying a given polynomial loop invariant. The class of loops we are considering can be modeled by a system of algebraic recurrence equations with constant coefficients, thus encoding program loops with affine operations among program variables. We turn the task of loop synthesis into a polynomial constraint problem by precisely characterizing the set of all loops satisfying the given invariant. We prove soundness of our approach, as well as its completeness with respect to an a priori fixed upper bound on the number of program variables. Our work has applications toward synthesizing loops satisfying a given polynomial loop invariant—program verification—as well as generating number sequences from algebraic relations. To understand viability of the methodology and heuristics for synthesizing loops, we implement and evaluate the method using the Absynth tool.

Skip 1INTRODUCTION Section

1 INTRODUCTION

The two most rigorous approaches for providing correct software are given by formal program verification and program synthesis [33]. The task of formal verification is to prove correctness of a given program with respect to a given (logical) specification. However, program synthesis aims at generating programs that adhere to a given specification. The result of a synthesis problem is therefore a program that is correct by construction. Whereas formal verification has received considerable attention with impressive results, such as in ensuring safety of device drivers [4] and security of web services [5], program synthesis turns out to be an algorithmically much more difficult challenge [25].

The classical setting of program synthesis has been to synthesize programs from proofs of logical specifications that relate the inputs and the outputs—in short, inputs-outputs—of the program [27]. Thanks to recent successful trends in formal verification based on automated reasoning [9, 10, 24], this traditional view of program synthesis has been refined to the setting of syntax-guided synthesis (SyGuS) [2]. In addition to logical specifications, SyGuS approaches consider further constraints on the program templates to be synthesized, thus limiting the search space of possible solutions. A wide range of efficient applications of SyGuS have so far emerged, such as programming by examples [13], and component-based synthesis [18] with learning [11] and sketching [29].

Yet both in the setting of verification and synthesis, one of the main challenges is to verify and/or synthesize loops/recursion. In verification, solving this challenge requires generating additional program assertions in the form of loop invariants [17, 22, 31]. Intuitively, a loop invariant is a formal description of the behavior of the loop, expressing loop properties that hold before and after each loop iteration. In synthesis, the challenge of reasoning with loops comes by answering the question of whether there exists a loop satisfying a given loop invariant and constructively synthesizing a loop with respect to a given invariant. We refer to this task of synthesis as loop synthesis. Note that loop synthesis can be considered as the reverse problem of loop invariant generation: rather than generating invariants summarizing a given loop, we synthesize loops whose summaries are captured by a given invariant property.

In this article, we present an algorithmic approach to loop synthesis by relying on the theory of algebraic recurrence equations [20]. We synthesize and optimize loops whose functional behavior is captured by a given invariant \( {p(\boldsymbol {x})=0} \) such that the synthesized loops use only affine computations among \( \boldsymbol {x} \). We define our loop synthesis task as follows:

LOOP SYNTHESIS Given a polynomial property \( p(\boldsymbol {x}) \) over a set \( \boldsymbol {x} \) of variables, generate a loop \( \mathcal {L} \) with affine operations over variables \( \boldsymbol {x} \) such that \( p(\boldsymbol {x})=0 \) is an invariant of the loop.

Our loop synthesis problem is formulated such that only one loop satisfying a given polynomial invariant \( p(\boldsymbol {x})=0 \) is captured; however, we note that in our work, we first characterize all loops for which \( p(\boldsymbol {x})=0 \) is an invariant (Section 4) and then extract one loop by optimizing the set of all loop solutions (Section 5). One such optimization criteria comes, for example, with avoiding trivial solutions—that is, avoiding loops whose affine updates do not change values of loop variables.

We believe our work is the first complete technique for synthesizing loops from (non-linear) polynomial invariants. The key ingredient of our work comes with the reduction of loop synthesis to the problem of solving algebraic recurrences among loop variables. The inner algorithmic magic of our approach is an SMT-based solution toward solving non-linear algebraic constraints, allowing us to test existential properties of bounded degree polynomials (invariants) to derive universal (loop) relations.

Motivating example. Let us first motivate loop synthesis using Figure 1(a). The loop is based on an online tutorial1 of the Dafny verification framework [26]: Figure 1(a) uses only affine updates among its variables, and the task is to revise/repair Figure 1(a) into a partially correct program with respect to the precondition \( N\ge 0 \) and post-condition \( c=N^3 \) such that the polynomial loop invariant \( n \le N \wedge c=n^3 \wedge k=3n^2+3n+1 \wedge m=6n+6 \) holds. We note that our focus in on partially correct programs [14], and thus we do not consider total correctness—that is, loop termination.

Fig. 1.

Fig. 1. Examples of loop synthesis. Parts (b) and (c) are revised versions of part (a) such that the expression \( {c=n^3 \wedge k=3n^2+3n+1 \wedge m=6n+6} \) is an invariant of parts (b) and (c).

In this article, we introduce an algorithmic approach to loop synthesis by relying on algebraic recurrence equations and constraint solving over polynomials. In particular, we automatically synthesize Figure 1(b) and (c) by using the given non-linear polynomial equalities \( c=n^3 \wedge k=3n^2+3n+1 \wedge m=6n+6 \) as input invariant to our loop synthesis task. Both synthesized programs, with the loop guard \( {n\lt N} \) as in Figure 1(a), are partially correct programs with respect to the given requirements. Moreover, Figure 1(b) and (c) precisely capture the solution space of \( c=n^3 \wedge k=3n^2+3n+1 \wedge m=6n+6 \) by implementing only affine operations.

Algebra-based loop synthesis. Inspired by SyGuS [2], we consider our loop synthesis task with additional requirements on the loop to be synthesized: we impose syntactic requirements on the form of loop expressions and guards to be synthesized. The imposed requirements allow us to reduce the loop synthesis task to the problem of generating linear/affine recurrences with constant coefficients, called C-finite recurrences [20]. As such, loop synthesis provides an algorithmic solution to the following loop reasoning challenge: Given a polynomial \( p(\boldsymbol {x}) \) over loop variables \( \boldsymbol {x} \), how can the entire solution space of \( {p(\boldsymbol {x})=0} \) be iteratively computed using only affine operations inducing C-finite number sequences among \( \boldsymbol {x} \)?

Our approach to synthesis is, however, conceptually different from other SyGuS-based methods (e.g., [11, 13, 29]): rather than iteratively refining both the input and the solution space of synthesized programs, we take polynomial relations describing a potentially infinite set of input values and precisely capture not just one loop but the set of all loops (i) whose invariant is given by our input polynomial and (ii) whose variables induce C-finite number sequences. Any instance of this set therefore yields a loop that is partially correct by construction and only implements affine computations. Figure 1(b) and (c) depict two solutions of our loop synthesis task for the invariant \( {c=n^3 \wedge k=3n^2+3n+1 \wedge m=6n+6} \).

The main steps of our approach are as follows. First, let \( p(\boldsymbol {x}) \) be a polynomial over variables \( \boldsymbol {x} \), and let \( {s\ge 0} \) be an upper bound on the number of program variables to be used in the loop. If not specified, s is considered to be the number of variables from \( \boldsymbol {x} \). Second, we use syntactic constraints over the loop body to be synthesized and define a loop template, as given by our programming model (8). Our programming model imposes that the functional behavior of the synthesized loops can be modeled by a system of C-finite recurrences (Section 3). Third, by using the invariant property of \( p(x)=0 \) for the loops to the synthesized, we construct a polynomial constraint problem (PCP) characterizing the set of all loops satisfying the constraints of (8) for which \( {p(x) = 0} \) is a loop invariant (Section 4). Our approach combines symbolic computation techniques over algebraic recurrence equations with polynomial constraint solving. We prove that our approach to loop synthesis is both sound and complete. By completeness, we mean that if there is a loop \( \mathcal {L} \) with at most s variables satisfying the invariant \( {p(\boldsymbol {x})=0} \) such that the loop body meets our C-finite/affine syntactic requirements, then \( \mathcal {L} \) is synthesized by our method (Theorem 4.2). Moving beyond s—that is, deriving an upper bound on the number of program variables from the invariant—is interesting further work, with connections to the inverse problem of difference Galois theory [34].

We finally note that our work is not restricted to specifications given by a single polynomial equality invariant. Rather, the invariant given as input to our synthesis approach can be conjunctions of polynomial equalities—as also shown in Figure 1.

Beyond loop synthesis. Our work has applications beyond loop synthesis—such as in generating number sequences from algebraic relations and program/compiler optimizations:

  • Generating number sequences: Our approach provides a partial solution to an open mathematical problem: given a polynomial relation among number sequences, for example, \( \begin{equation} f(n)^4 + 2f(n)^3f(n+1) - f(n)^2f(n+1)^2 - 2f(n)f(n+1)^3 + f(n+1)^4 = 1, \end{equation} \) synthesize algebraic recurrences defining these sequences. There exists no complete method for solving this challenge, but we give a complete approach in the C-finite setting parameterized by an a priori bound s on the order of the recurrences. For the given relation (1) among \( f(n) \) and \( f(n+1) \), our work generates the C-finite recurrence equation \( f(n+2)=f(n+1)+f(n), \) which induces the Fibonacci sequence.

  • Program optimizations: Given a polynomial invariant, our approach generates a PCP such that any solution to this PCP yields a loop satisfying the given invariant. By using additional constraints encoding a cost function on the loops to be synthesized, our method can be extended to synthesize loops that are optimal with respect to the considered costs, such as synthesizing loops that use only addition in variable updates. Consider, for example, Figure 1(b) and (c): the loop body of Figure 1(b) uses only addition, whereas Figure 1(c) implements also multiplications by constants.

  • Compiler optimizations: To reduce execution time spent within loops, compiler optimization techniques, such as strength reduction, aim at replacing expensive loop operations with semantically equivalent but less expensive operations [6]. One such optimization within strength reduction replaces “strong” loop multiplications by additions among program variables. The burden of strength reductions comes, however, with identifying inductive loop variables and invariants to be used for loop optimization. Our loop synthesis method therefore serves as a foundation for strength reduction optimization by converting polynomial (loop) expressions into incremental affine (loop) computations.

  • Invariant generation and loop equivalence: Our approach may improve the state of the art in polynomial invariant generation, as follows. By considering a loop L, let I denote the polynomial loop invariant of L generated by existing invariant generation approaches (e.g., [15, 17, 22, 31]). By using I as our input, our technique synthesizes a loop \( L^{\prime } \) that is (relational) equivalent to L in the sense that both L and \( L^{\prime } \) have the same invariant I. We showcase such use case of our synthesis approach in Section 7.1. We further note that our loop synthesis task can also be used to check soundness/completeness of existing invariant generation techniques: apply invariant generation approaches on \( L^{\prime } \) with the goal of deriving the invariant I.

  • Teaching formal methods: Finally, our work can also be used in solving formal verification challenges, as follows. By considering examples that are not partially correct with respect to a given invariant, the task is to revise these programs into correct ones. By using the given invariants as inputs to our approach, we automatically infer partially correct programs, thus allowing users/teachers/practitioners of formal methods to repair programs in a fully automated manner. This use case of loop synthesis has been deployed in our “Formal Methods in Computer Science” master course at the TU Wien, particularly in generating and correcting course assignments on deductive program verification, as discussed in Section 7.2.

Contributions. This article brings integrated approaches to formal modeling and analysis of software by combining symbolic computation, program analysis, and SMT reasoning. In summary, we make the following contributions:

  • We propose an automated procedure for synthesizing loops that are partially correct with respect to a given polynomial loop invariant (Section 4). By exploiting properties of C-finite sequences, we construct a PCP that precisely captures all solutions of our loop synthesis task. We are not aware of previous approaches synthesizing loops from (non-linear) polynomial invariants.

  • We also synthesize the initial values of the loop variables—that is, the values before executing the loop. We first consider loops with concrete initial values, so-called non-parameterized loops (Section 4.2). We then refine our technique toward the synthesis of parameterized loops—that is, loops with symbolic initial values (Section 4.3).

  • We prove that our approach to loop synthesis is sound and complete (Theorem 4.2). In other words, if there is a loop whose invariant is captured by our given specification, our approach synthesizes this loop. To this end, we consider completeness modulo an a priori fixed upper bound s on the number of loop variables.

  • We extend our task of loop synthesis with additional constraints for optimizing the solution space of our PCP (Section 5). These optimizations are essential in automating loop synthesis.

  • We implemented our approach in the new open source framework Absynth. We first evaluated our work on a number of academic examples on loop analysis as well as on generating number sequences in algorithmic combinatorics (Section 6).

  • We further used our work in the context of loop equivalence and report on our experience in using loop synthesis as a workhorse for teaching formal methods at the TU Wien (Section 7).

Relation to our previous work [16]. This article extends our previous work [16] in several ways, as summarized in the following. In Section 2, we provide additional algebraic results and insights needed for translating the task of loop into a recurrence solving problem in Section 4.3. Soundness and completeness proofs of our key results are given in Section 4, together with illustrative examples showcasing the main steps of loop synthesis. Further and most importantly, Section 4.3 extends our previous work [16] with loop synthesis for parameterized loops. Section 6.3 presents illustrative examples from our loop synthesis experiments, and Section 7 reports on using loop synthesis for loop equivalence and teaching formal methods.

Skip 2PRELIMINARIES Section

2 PRELIMINARIES

Let \( \mathbb {K} \) be a computable field with characteristic zero. We also assume \( \mathbb {K} \) to be algebraically closed—that is, every non-constant polynomial in \( \mathbb {K}[x] \) has at least one root in \( \mathbb {K} \). The algebraic closure \( \bar{\mathbb {Q}} \) of the field of rational numbers \( \mathbb {Q} \) is such a field; \( \bar{\mathbb {Q}} \) is called the field of algebraic numbers.

We denote by \( \mathbb {K}[x_1,\dots ,x_n] \) the multivariate polynomial ring with indeterminates \( x_1,\dots ,x_n \). For a list \( x_1,\dots ,x_n \), we write \( \boldsymbol {x} \) if the number of variables is known from the context or irrelevant. As \( \mathbb {K} \) is algebraically closed, every polynomial \( p\in \mathbb {K}[\boldsymbol {x}] \) of degree r has exactly r roots. Therefore, the following theorem follows immediately.

Theorem 2.1.

The zero polynomial is the only polynomial in \( \mathbb {K}[\boldsymbol {x}] \) having infinitely many roots.

2.1 Polynomial Constraint Problem

A polynomial constraint F is of the form \( p \bowtie 0, \) where p is a polynomial in \( \mathbb {K}[\boldsymbol {x}] \) and \( \mathord {\bowtie } \in \lbrace \lt ,\le ,=,\ne ,\ge ,\gt \rbrace \). A clause is then a disjunction \( {C = F_1 \vee \dots \vee F_m} \) of polynomial constraints. A unit clause is a special clause consisting of a single disjunct (i.e., \( m=1 \)). A polynomial constraint problem (PCP) is then given by a set of clauses \( \mathcal {C} \). We say that a variable assignment \( \sigma : \lbrace x_1,\dots ,x_n\rbrace \rightarrow \mathbb {K} \) satisfies a polynomial constraint \( p\bowtie 0 \) if \( p(\sigma (x_1),\dots ,\sigma (x_n)) \bowtie 0 \) holds. Furthermore, \( \sigma \) satisfies a clause \( F_1 \vee \cdots \vee F_m \) if for some i, \( F_i \) is satisfied by \( \sigma \). Finally, \( \sigma \) satisfies a clause set—and is therefore a solution of the PCP—if every clause within the set is satisfied by \( \sigma \). We write \( \mathcal {C}\sqsubset \mathbb {K}[\boldsymbol {x}] \) to indicate that all polynomials in the clause set \( \mathcal {C} \) are contained in \( \mathbb {K}[\boldsymbol {x}] \). For a matrix M with entries \( m_1,\dots ,m_s, \) we define the clause set \( \operatorname{cstr}(M) \) to be \( {\lbrace m_1=0,\dots ,m_s=0\rbrace } \).

2.2 Number Sequences and Recurrence Relations

A number sequence \( \left({x(n)}\right)_{n=0}^\infty \) is called C-finite if it satisfies a linear recurrence with constant coefficients, also known as C-finite recurrence [20]. Let \( c_0,\dots ,c_{r-1} \in \mathbb {K} \) and \( c_0\ne 0 \), then (2) \( \begin{equation} x(n+r) + c_{r-1}x(n+r-1) + \cdots + c_1x(n+1) + c_0x(n) = 0 \end{equation} \) is a C-finite recurrence of order r. The order of a sequence is defined by the order of the recurrence it satisfies. We refer to a recurrence of order r also as an r-order recurrence—for example, as a first-order recurrence when \( r=1 \) or a second-order recurrence when \( r=2 \). A recurrence of order r and r initial values define a sequence, and different initial values lead to different sequences. For simplicity, we write \( \left({x(n)}\right)_{n=0}^\infty = 0 \) for \( \left({x(n)}\right)_{n=0}^\infty = \left({0}\right)_{n=0}^\infty \).

Example 2.1.

Let \( a\in \mathbb {K} \). The constant sequence \( \left({a}\right)_{n=0}^\infty \) satisfies a first-order recurrence equation \( {x(n+1) = x(n)} \) with \( {x(0)=a} \). The geometric sequence \( \left({a^n}\right)_{n=0}^\infty \) satisfies \( {x(n+1) = a x(n)} \) with \( {x(0)=1} \). The sequence \( \left({n}\right)_{n=0}^\infty \) satisfies a second-order recurrence \( {x(n+2) = 2 x(n+1) - x(n)} \) with \( {x(0)=0} \) and \( {x(1)=1} \).

From the closure properties of C-finite sequences [20], the product and the sum of C-finite sequences are also C-finite. Moreover, we also have the following properties.

Theorem 2.2 ([20]).

Let \( \left({u(n)}\right)_{n=0}^\infty \) and \( \left({v(n)}\right)_{n=0}^\infty \) be C-finite sequences of order r and s, respectively. Then,

(1)

\( \left({u(n) + v(n)}\right)_{n=0}^\infty \) is C-finite of order at most \( r+s \), and

(2)

\( \left({u(n)\cdot v(n)}\right)_{n=0}^\infty \) is C-finite of order at most rs.

Theorem 2.3 ([20]).

Let \( \omega _1,\dots ,\omega _t\in \mathbb {K} \) be pairwise distinct and \( p_1,\dots ,p_t\in \mathbb {K}[x] \). The number sequence \( \left({p_1(n)\omega _1^n + \cdots + p_t(n)\omega _t^n}\right)_{n=0}^\infty \) is the zero sequence if and only if the sequences \( \left({p_1(n)}\right)_{n=0}^\infty ,\dots ,\left({p_t(n)}\right)_{n=0}^\infty \) are zero.

Theorem 2.4 ([20]).

Let \( p = c_0 + c_1 x + \cdots + c_kx^k \in \mathbb {K}[x] \). Then, \( {\left({p(n)}\right)_{n=0}^\infty = 0} \) if and only if \( c_0 = \cdots = c_k = 0 \).

Theorem 2.5 ([20]).

Let \( \left({u}\right)_{n=0}^\infty \) be a sequence satisfying a C-finite recurrence of order r. Then, \( u(n) = 0 \) for all \( n\in \mathbb {N} \) if and only if \( u(n) = 0 \) for \( n\in \lbrace 0,\dots ,r-1\rbrace \).

We define a system of C-finite recurrences of order r and size s to be of the form \( \begin{equation*} X_{n+r} + C_{r-1}X_{n+r-1} + \cdots + C_1X_{n+1} + C_0X_n = 0, \end{equation*} \) where \( X_n = (x_1(n) \quad \cdots \quad x_s(n))^\intercal \) and \( C_i\in \mathbb {K}^{s\times s} \). Every C-finite recurrence system can be transformed into a first-order system of recurrences by increasing the size such that we get (3) \( \begin{equation} X_{n+1} = B X_{n} \qquad \text{where $B$ is invertible.} \end{equation} \)

The closed form solution of a C-finite recurrence system (3) is determined by the roots \( \omega _1,\dots , \omega _t \) of the characteristic polynomial of B, or equivalently by the eigenvalues \( \omega _1,\dots , \omega _t \) of B. We recall that the characteristic polynomial \( \chi _B \) of the matrix B is defined as (4) \( \begin{equation} \chi _B(\omega) = \det (\omega I - B), \end{equation} \) where \( \det \) denotes the (matrix) determinant and I the identity matrix. Let \( m_1,\dots ,m_t \) respectively denote the multiplicities of the roots \( \omega _1,\dots , \omega _t \) of \( \chi _B \). (5) \( \begin{equation} X_n = \sum _{i=1}^t \sum _{j=1}^{m_i} C_{ij} \omega _i^n n^{j-1} \qquad \text{with $C_{ij} \in \mathbb {K}^{s\times 1}$.} \end{equation} \)

However, not every choice of the \( C_{ij} \) gives rise to a solution. For obtaining a solution, we substitute the general form (5) into the original system (3) and compare coefficients. The following example illustrates the procedure for computing closed form solutions.

Example 2.2.

The most well-known C-finite sequence is the Fibonacci sequence satisfying a recurrence of order 2, which corresponds to the following first-order recurrence system: (6) \( \begin{equation} \begin{pmatrix} f(n+1) \\ g(n+1) \end{pmatrix}= \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} f(n) \\ g(n). \end{pmatrix} \end{equation} \)

The eigenvalues of B are given by \( \omega _{1,2}=\frac{1}{2}(1 \pm \sqrt {5}) \) with multiplicities \( m_1 = m_2 = 1 \). Therefore, the general solution for the recurrence system is of the form (7) \( \begin{equation} \begin{pmatrix} f(n) \\ g(n) \end{pmatrix}= \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}\omega _1^n + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix}\omega _2^n. \end{equation} \)

By substituting (7) into (6), we get the following constraints over the coefficients: \( \begin{equation*} \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}\omega _1^{n+1} + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix}\omega _2^{n+1} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \left(\begin{pmatrix} c_1 \\ c_2 \end{pmatrix}\omega _1^n + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix}\omega _2^n \right) \end{equation*} \) Bringing everything to one side yields \( \begin{equation*} \begin{pmatrix} c_1\omega _1 - c_1 - c_2 \\ c_2\omega _1 - c_1 \end{pmatrix}\omega _1^{n} + \begin{pmatrix} d_1\omega _2 - d_1 - d_2\\ d_2\omega _2 - d_1 \end{pmatrix}\omega _2^{n} = 0. \end{equation*} \) For the preceding equation to hold, the coefficients of the \( \omega _i^n \) have to be 0. In other words, the following linear system determines \( c_1,c_2 \) and \( d_1,d_2 \): \( \begin{equation*} \begin{pmatrix} \omega _1 - 1 & -1 & 0 & 0 \\ -1 & \omega _1 & 0 & 0 \\ 0 & 0 & \omega _2 - 1 & -1 \\ 0 & 0 & -1 & \omega _2 \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \\ d_1 \\ d_2 \end{pmatrix} = 0. \end{equation*} \) The solution space is generated by \( (\omega _1,1,0,0) \) and \( (0,0,\omega _2,1) \). The solution space of the C-finite recurrence system hence consists of linear combinations of \( \begin{equation*} \begin{pmatrix} \omega _1 \\ 1 \end{pmatrix}\omega _1^n \quad \text{and}\quad \begin{pmatrix} \omega _2 \\ 1 \end{pmatrix}\omega _2^n. \end{equation*} \) In other words, by solving the linear system \( \begin{align*} \begin{pmatrix} f(0) \\ g(0) \end{pmatrix}&= E \begin{pmatrix} \omega _1 \\ 1 \end{pmatrix}\omega _1^0 + F \begin{pmatrix} \omega _2 \\ 1 \end{pmatrix}\omega _2^0\\ \begin{pmatrix} f(1) \\ g(1) \end{pmatrix}= \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} f(0) \\ g(0) \end{pmatrix}&= E \begin{pmatrix} \omega _1 \\ 1 \end{pmatrix}\omega _1^1 + F \begin{pmatrix} \omega _2 \\ 1 \end{pmatrix}\omega _2^1 \end{align*} \) for \( {E, F\in \mathbb {K}^{2\times 1}} \) with \( f(0)=1 \) and \( g(0)=0 \), we get closed forms for (6): \( \begin{equation*} f(n) = \frac{5+\sqrt {5}}{5(1+\sqrt {5})}\omega _1^{n+1} - \frac{1}{\sqrt {5}}\omega _2^{n+1} ~\text{and}~ g(n) = \frac{1}{\sqrt {5}}\omega _1^{n} - \frac{1}{\sqrt {5}}\omega _2^{n}. \end{equation*} \)

Then, \( f(n) \) represents the Fibonacci sequence starting at 1 and \( g(n) \) starts at 0. Solving for E and F with symbolic \( f(0) \) and \( g(0) \) yields a parameterized closed form, where the entries of E and F are linear functions in the symbolic initial values.

Skip 3OUR PROGRAMMING MODEL Section

3 OUR PROGRAMMING MODEL

Given a polynomial relation \( {p(x_1,\dots ,x_s)=0} \), our loop synthesis procedure generates a first-order C-finite/affine recurrence system (3) with \( {X_n = (x_1(n) \quad \cdots \quad x_s(n))^\intercal } \) such that \( {p(x_1(n),\dots ,x_s(n))=0} \) holds for all \( {n\in \mathbb {N}} \). It is not hard to argue that every first-order C-finite recurrence system corresponds to a loop with simultaneous variable assignments of the following form: (8) \( \begin{equation} (x_1,\dots,x_s) \gets (a_1,\dots,a_s)\\ {\bf {while}}~true~{\bf {do}}\\ \quad(x_1,\dots,x_s) \gets (p_1(x_1,\dots,x_s),\dots,p_s(x_1,\dots,x_s))\\ {\bf {end}}, \end{equation} \) where the program variables \( x_1,\dots ,x_s \) are numeric, \( a_1,\dots ,a_s \) are (symbolic) constants in \( \mathbb {K}, \) and \( p_1,\dots ,p_s \) are polynomials in \( \mathbb {K}[x_1,\dots ,x_s] \). For a loop variable \( x_i \), we denote by \( x_i(n) \) the value of \( x_i \) at the nth loop iteration. In other words, we view loop variables \( x_i \) as sequences \( \left({x_i(n)}\right)_{n=0}^\infty \). We call a loop of the form (8) parameterized if at least one of \( a_1,\dots ,a_s \) is symbolic and non-parameterized otherwise.

Remark 3.1.

Our synthesized loops of the form (8) are non-deterministic, with loop guards being true. We synthesize loops such that the given invariant holds for an arbitrary/unbounded number of loop iterations—for example, also for loop guards \( n\lt N \) as in Figure 1.

Remark 3.2.

Although the output of our synthesis procedure is basically an affine program, note that C-finite recurrences capture a larger class of programs. For example, the program \( \begin{equation*} (x,y) \leftarrow (0,0);~{\mathbf {while}}~true~{\mathbf {do}}~(x,y) \leftarrow (x+y^2,y+1)~{\mathbf {end}} \end{equation*} \) can be modeled by a C-finite recurrence system of order 4, which can be turned into an equivalent first-order system of size 6. Thus, to synthesize loops inducing the sequences \( \left({x(n)}\right)_{n=0}^\infty \) and \( \left({y(n)}\right)_{n=0}^\infty \), we have to consider recurrence systems of size 6.

Example 3.1.

The Fibonacci recurrence system (6) in Example 2.2 corresponds to the following loop: \( \begin{equation*} (f,g) \leftarrow (1,0);~{\mathbf {while}}~true~{\mathbf {do}}~(f,g) \leftarrow (f+g,f)~{\mathbf {end}}. \end{equation*} \)

Algebraic relations and loop invariants. Let p be a polynomial in \( \mathbb {K}[z_1,\dots ,z_s], \) and let \( (x_1(n))^\infty _{n=0},\dots ,(x_s(n))^\infty _{n=0} \) be number sequences. We call p an algebraic relation for the given sequences if \( {p(x_1(n),\dots ,x_s(n)) = 0} \) for all \( {n\in \mathbb {N}} \). Moreover, p is an algebraic relation for a system of recurrences if it is an algebraic relation for the corresponding sequences. It is immediate that for every algebraic relation p of a recurrence system, \( {p=0} \) is a loop invariant for the corresponding loop of the form (8)—that is, \( {p=0} \) holds before and after every loop iteration.

Skip 4ALGEBRA-BASED LOOP SYNTHESIS Section

4 ALGEBRA-BASED LOOP SYNTHESIS

We now present our approach for synthesizing loops satisfying a given polynomial property (invariant) by using affine loop assignments. We transform the loop synthesis problem into a PCP as described in Section 4.1. In Section 4.2, we introduce the clause sets of our PCP that precisely describe the solutions for the synthesis of loops, particularly to non-parameterized loops. We extend this approach in Section 4.3 to parameterized loops.

4.1 Setting and Overview of Our Method

Given a constraint \( {p=0} \) with \( p\in \mathbb {K}[x_1,\dots ,x_s,y_1,\dots ,y_s] \), we aim to synthesize a system of C-finite recurrences such that p is an algebraic relation thereof. Intuitively, the values of loop variables \( x_1,\dots ,x_s \) are described by the sequences \( x_1(n),\dots ,x_s(n) \) for arbitrary n, and \( y_1,\dots ,y_s \) correspond to the initial values \( x_1(0),\dots ,x_s(0) \). In other words, we have a polynomial relation p among loop variables \( x_i \) and their initial values \( y_i \), for which we synthesize a loop of the form (8) such that \( {p=0} \) is a loop invariant of a loop defined in (8).

Remark 4.1.

Our approach is not limited to invariants describing relationship between program variables from a single loop iteration. Instead, it naturally extends to relations among different loop iterations. For instance, by considering the relation in Equation (1), we synthesize a loop computing the Fibonacci sequence.

The key step in our work comes with precisely capturing the solution space for our loop synthesis problem as a PCP. Our PCP is divided into the clause sets \( \mathcal {C}_\mathsf {roots} \), \( \mathcal {C}_\mathsf {coeff} \), \( \mathcal {C}_\mathsf {init} \), and \( \mathcal {C}_\mathsf {alg} \), as illustrated in Figure 2 and explained next. Our PCP implicitly describes a first-order C-finite recurrence system and its corresponding closed form system. The one-to-one correspondence between these two systems is captured by the clause sets \( \mathcal {C}_\mathsf {roots} \), \( \mathcal {C}_\mathsf {coeff} \), and \( \mathcal {C}_\mathsf {init} \). Intuitively, these constraints mimic the procedure for computing the closed form of a recurrence system (see [20]). The clause set \( \mathcal {C}_\mathsf {alg} \) interacts between the closed form system and the polynomial constraint \( {p=0} \), and ensures that p is an algebraic relation of the system. Furthermore, the recurrence system is represented by the matrix B and the vector A of initial values where both consist of symbolic entries. Then, a solution of our PCP—which assigns values to those symbolic entries—yields a desired synthesized loop.

Fig. 2.

Fig. 2. Overview of the PCP describing loop synthesis.

In what follows, we only consider a unit constraint \( p=0 \) as input to our loop synthesis procedure. However, our approach naturally extends to conjunctions of polynomial equality constraints.

4.2 Synthesizing Non-Parameterized Loops

We now present our work for synthesizing loops, particularly non-parameterized loops of the form (8). In other words, we aim at computing concrete initial values for all program variables. Our implicit representation of the recurrence system is thus of the form (9) \( \begin{equation} X_{n+1} = BX_n, \qquad X_0 = A, \end{equation} \) where \( B\in \mathbb {K}^{s\times s} \) is invertible and \( A\in \mathbb {K}^{s\times 1} \), both containing symbolic entries.

As described in Section 2.2, the closed form of (9) is determined by the roots of the characteristic polynomial \( \chi _B(\omega) \) defined in (4). Using the closed form representation (5), we conclude that the eigenvalues of the coefficient matrix B in (9) determine the closed form of (9), thus implying the need of synthesizing the (symbolic) eigenvalues of B. Note that B may contain both symbolic and concrete values (where concrete values may come from user-given initial values or additional user-given values for the symbolic parameters of the loop given in Equation (8)). Let us denote the symbolic entries of B by \( \boldsymbol {b} \). Since \( \mathbb {K} \) is algebraically closed, we know that B has s (not necessarily distinct) eigenvalues. We therefore fix a set of distinct symbolic eigenvalues \( \omega _1,\dots ,\omega _t \) together with their multiplicities \( m_1,\dots ,m_t \) with \( m_i\gt 0 \) for \( i=1,\dots ,t \) such that \( \sum _{i=1}^t m_i = s \). We call \( m_1,\dots ,m_t \) an integer partition of s. We next define the clause sets of our PCP.

Root constraints \( \mathcal {C}_\mathsf {roots} \). The clause set \( \mathcal {C}_\mathsf {roots} \) ensures that B is invertible and that \( \omega _1,\dots ,\omega _t \) are distinct symbolic eigenvalues with multiplicities \( m_1,\dots ,m_t \). Note that B is invertible if and only if all eigenvalues \( \omega _i \) are non-zero. Furthermore, since \( \mathbb {K} \) is algebraically closed, every polynomial \( f(z) \) can be written as the product of linear factors of the form \( z - \omega \), with \( \omega \in \mathbb {K} \), such that \( f(\omega)=0 \). Let us recall that the characteristic polynomial \( \chi _B \) of the matrix B is defined as \( {\chi _B(\omega) = \det (\omega I - B)} \), where \( \det \) denotes the (matrix) determinant and I the identity matrix. Then, the equation (10) \( \begin{equation} \chi _B(z) = (z - \omega _1)^{m_1} \cdots (z - \omega _t)^{m_t} \end{equation} \) holds for all \( z\in \mathbb {K} \), where \( \chi _B(z)\in \mathbb {K}[\boldsymbol {\omega },\boldsymbol {b},z] \). Bringing everything to one side and regrouping terms based on the monomials in z, we get \( \begin{equation*} q_0 + q_1z + \cdots + q_d z^d = 0, \end{equation*} \) implying that the \( {q_i\in \mathbb {K}[\boldsymbol {\omega },\boldsymbol {b}]} \) have to be zero. Note that \( q_i \) are the coefficients of the monomials in z resulting from Equation (10); as such, \( q_i \) are polynomials in \( \boldsymbol {\omega },\boldsymbol {b} \). The clause set characterizing the eigenvalues \( \omega _i \) of B is then \( \begin{align*} \mathcal {C}_\mathsf {roots}= \lbrace q_0=0,\dots ,q_d=0 \rbrace \cup \bigcup _{{c}i,j=1,\dots ,t\\ i\ne j} \lbrace \omega _i \ne \omega _j\rbrace \cup \bigcup _{i=1,\dots ,t} \lbrace \omega _i \ne 0\rbrace . \end{align*} \)

Coefficient constraints \( \mathcal {C}_\mathsf {coeff} \). The fixed symbolic roots/eigenvalues \( \omega _1,\dots ,\omega _t \) with multiplicities \( m_1,\dots ,m_t \) induce the general closed form solution (11) \( \begin{equation} X_n = \sum _{i=1}^t \sum _{j=1}^{m_i} C_{ij} \omega _i^n n^{j-1}, \end{equation} \) where the \( {C_{ij}\in \mathbb {K}^{s\times 1}} \) are column vectors containing symbolic entries. As stated in Section 2.2, not every choice of the \( C_{ij} \) gives rise to a valid solution. Instead, \( C_{ij} \) have to obey certain conditions that are determined by substituting into the original recurrence system of (9): \( \begin{align*} X_{n+1} &= \sum _{i=1}^t \sum _{j=1}^{m_i} C_{ij} \omega _i^{n+1}(n+1)^{j-1} = \sum _{i=1}^t \sum _{j=1}^{m_i} \left(\sum _{k=j}^{m_i} \binom{k-1}{j-1} C_{ik} \omega _i\right) \omega _i^{n}n^{j-1} \\ &= B \left(\sum _{i=1}^t \sum _{j=1}^{m_i} C_{ij} \omega _i^n n^{j-1}\right) = BX_n. \end{align*} \) Bringing everything to one side yields \( {X_{n+1} - BX_n = 0,} \) and thus (12) \( \begin{equation} \sum _{i=1}^t \sum _{j=1}^{m_i} \underbrace{\left(\left(\sum _{k=j}^{m_i} \binom{k-1}{j-1} C_{ik} \omega _i\right) - BC_{ij}\right)}_{D_{ij}} \omega _i^{n}n^{j-1} = 0. \end{equation} \) Equation (12) holds for all \( n\in \mathbb {N} \). By Theorem 2.4, we then have \( D_{ij} = 0 \) for all \( i,j \) and define \( \begin{equation*} \mathcal {C}_\mathsf {coeff}= \bigcup _{i=1}^t \bigcup _{j=1}^{m_i} \operatorname{cstr}(D_{ij}). \end{equation*} \)

Initial values constraints \( \mathcal {C}_\mathsf {init} \). The constraints \( \mathcal {C}_\mathsf {init} \) describe properties of initial values \( x_1(0),\dots ,x_s(0) \). We enforce that (11) equals \( B^nX_0 \), for \( n=0,\dots ,d-1 \), where d is the degree of the characteristic polynomial \( \chi _B \) of B, by \( \begin{equation*} \mathcal {C}_\mathsf {init}= \operatorname{cstr}(M_0) \cup \cdots \cup \operatorname{cstr}(M_{d-1}), \end{equation*} \) where \( M_i = X_i - B^iX_0 \), with \( {X_0=A} \) as in (9) and \( X_i \) being the right-hand side of (11) where n is replaced by i.

Algebraic relation constraints \( \mathcal {C}_\mathsf {alg} \). The constraints \( \mathcal {C}_\mathsf {alg} \) are defined to ensure that p is an algebraic relation among the \( x_i(n) \). Using (11), the closed forms of the \( x_i(n) \) are expressed as \( \begin{equation*} x_i(n) = p_{i,1} \omega _1^n + \cdots + p_{i,t} \omega _t^n, \end{equation*} \) where the \( p_{i,j} \) are polynomials in \( \mathbb {K}[n,\boldsymbol {c}] \). By substituting the closed forms and the initial values into the polynomial p, we get (13) \( \begin{equation} \begin{aligned}p^{\prime } = p(x_1(n),\dots ,x_s(n),x_1(0),\dots ,x_s(0)) = q_0 + nq_1 + n^2q_2 + \cdots + n^kq_k, \end{aligned} \end{equation} \)

where the \( q_i \) are of the form (14) \( \begin{equation} w_{i,1}^nu_{i,1} + \cdots + w_{i,{\ell }}^n u_{i,{\ell }} \end{equation} \) with \( u_{i,1},\dots ,u_{i,{\ell }}\in \mathbb {K}[\boldsymbol {a},\boldsymbol {c}] \) and \( w_{i,1},\dots ,w_{i,{\ell }} \) being monomials in \( \mathbb {K}[\boldsymbol {\omega }] \).

Proposition 4.1.

Let p be of the form (13). Then, \( {\left({p(n)}\right)_{n=0}^\infty =0} \) if and only if \( {\left({q_i(n)}\right)_{n=0}^\infty = 0} \) for \( {i=0,\dots ,k} \).

Proof.

One direction is obvious, and for the other assume \( {p(n)=0} \). By rearranging \( p, \) we get \( p_1(n) w_1^n + \cdots + p_\ell (n) w_\ell ^n \). Let \( \tilde{\omega }_1,\dots ,\tilde{\omega }_t\in \mathbb {K} \) be such that \( \tilde{p} = p_1(n) \tilde{w}_1^n + \cdots + p_\ell (n) \tilde{w}_\ell ^n = 0 \) with \( \tilde{w}_i = w_i(\boldsymbol {\tilde{\omega }}) \). Note that the \( \tilde{w}_i \) are not necessarily distinct. However, consider \( v_1,\dots ,v_r \) to be the pairwise distinct elements of the \( \tilde{w}_i \). Then we can write \( \tilde{p} \) as \( \sum _{i=1}^r v_i^n (p_{i,0} + n p_{i,1} + \cdots + n^k p_{i,k}) \). By Theorems 2.3 and 2.4, we get that the \( p_{i,j} \) have to be 0. Therefore, also \( v_i^n p_{i,j} = 0 \) for all \( i,j \). Then, for each \( j=0,\dots ,k \), we have \( {v_1^np_{1,j} + \cdots + v_r^np_{1,j} = 0 = q_j} \).□

As p is an algebraic relation, we have that \( p^{\prime } \) should be 0 for all \( n\in \mathbb {N} \). Proposition 4.1 then implies that the \( q_i \) have to be 0 for all \( n\in \mathbb {N} \).

Lemma 4.1.

Let q be of the form (14). Then, \( {q=0} \) for all \( {n\in \mathbb {N}} \) if and only if \( {q=0} \) for \( {n\in \lbrace 0,\dots ,\ell -1\rbrace } \).

Proof.

The proof follows from Theorem 2.5 and from the fact that q satisfies a C-finite recurrence of order l. To be more precise, the \( u_{i,j} \) and \( w_{i,j}^n \) satisfy a first-order C-finite recurrence: as \( u_{i,j} \) is constant, it satisfies a recurrence of the form \( x(n+1)=x(n) \), and \( w_{i,j}^n \) satisfies \( x(n+1) = w_i x(n) \). Then, by Theorem 2.2, we get that \( w_{i,j}^n u_{i,j} \) is C-finite of order at most 1, and q is C-finite of order at most \( \ell \).□

Even though the \( q_i \) contain exponential terms in n, it follows from Lemma 4.1 that the solutions for the \( q_i \) being 0 for all \( n\in \mathbb {N} \) can be described as a finite set of polynomial equality constraints: let \( Q_i^j \) be the polynomial constraint \( {w_{i,1}^ju_{i,1} + \cdots + w_{i,\ell }^j u_{i,\ell } = 0} \) for \( q_i \) of the form (14), and let \( {\mathcal {C}_i = \lbrace Q_i^0,\dots ,Q_i^{\ell -1}\rbrace } \) be the associated clause set. Then, the clause set ensuring that p is indeed an algebraic relation is given by \( \begin{equation*} \mathcal {C}_\mathsf {alg}= \mathcal {C}_0 \cup \cdots \cup \mathcal {C}_k. \end{equation*} \)

Remark 4.2.

Observe that Theorem 2.5 can be applied to Formula (13) directly, as \( p^{\prime } \) satisfies a C-finite recurrence. Then, by the closure properties of C-finite recurrences, the upper bound on the order of the recurrence that \( p^{\prime } \) satisfies is given by \( r = \sum _{i=0}^k 2^i \ell \). In other words, by Theorem 2.5, we would need to consider \( p^{\prime } \) with \( n=0,\dots ,r-1 \), which yields a non-linear system with a degree of at least \( r-1 \). Note that r depends on \( 2^i \), which stems from the fact that \( \left({n}\right)_{n=0}^\infty \) satisfies a recurrence of order 2, and \( n^i \) therefore satisfies a recurrence of order at most \( 2^i \). Thankfully, Proposition 4.1 allows us to only consider the coefficients of the \( n^i \) and therefore lower the size of our constraints.

Having defined the clause sets \( \mathcal {C}_\mathsf {roots} \), \( \mathcal {C}_\mathsf {coeff} \), \( \mathcal {C}_\mathsf {init} \), and \( \mathcal {C}_\mathsf {alg} \), we define our PCP as the union of these four clause sets. Note that the matrix B, the vector A, the polynomial \( p, \) and the multiplicities of the symbolic roots \( \boldsymbol {m} = m_1,\dots ,m_t \) uniquely define the clauses discussed previously. We define our PCP to be the clause set \( \mathcal {C}_{AB}^p(\boldsymbol {m}) \) as follows: (15) \( \begin{equation} \mathcal {C}_{AB}^p(\boldsymbol {m}) = \mathcal {C}_\mathsf {roots}\cup \mathcal {C}_\mathsf {init}\cup \mathcal {C}_\mathsf {coeff}\cup \mathcal {C}_\mathsf {alg}. \end{equation} \)

Recall that \( \boldsymbol {a} \) and \( \boldsymbol {b} \) are the symbolic entries in the matrices A and B given in (9), \( \boldsymbol {c} \) are the symbolic entries in the \( C_{ij} \) in the general form (11), and \( \boldsymbol {\omega } \) are the symbolic eigenvalues of B. We then have \( {\mathcal {C}_{AB}^p(\boldsymbol {m})\sqsubset \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}]} \).

It is not difficult to see that the constraints in \( \mathcal {C}_\mathsf {alg} \) determine the size of our PCP. As such, the degree and the number of terms in the invariant have a direct impact on the size and the maximum degree of the polynomials in our PCP. What might not be obvious is that the number of distinct symbolic roots influences the size and the maximum degree of our PCP. The more distinct roots are considered the higher is the number of terms in (14), and therefore more instances of (14) have to be added to our PCP.

Consider \( {p\in \mathbb {K}[x_1,\dots ,x_s,y_1,\dots ,y_s]} \), \( {B\in \mathbb {K}^{s\times s}} \) and \( {A\in \mathbb {K}^{s\times 1}} \), and let \( {m_1,\dots ,m_t} \) be an integer partition of \( \deg _\omega (\chi _B(\omega)) \). We then get the following theorem.

Theorem 4.1.

The mapping \( {\sigma :\lbrace \boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}\rbrace \rightarrow \mathbb {K}} \) is a solution of \( {\mathcal {C}_{AB}^p(\boldsymbol {m})} \) if and only if \( p(\boldsymbol {x},x_1(0),\dots ,x_s(0)) \) is an algebraic relation for \( {X_{n+1} = \sigma (B) X_n} \) with \( {X_0 = \sigma (A)} \), and the eigenvalues of \( \sigma (B) \) are \( \sigma (\omega _1),\dots ,\sigma (\omega _t) \) with multiplicities \( m_1,\dots ,m_t \).

From Theorem 4.1, we then get Algorithm 1 for synthesizing the C-finite recurrence representation of a non-parameterized loop of the form (8): \( \mathsf {IntPartitions}(s) \) returns the set of all integer partitions of an integer s, and \( \mathsf {Solve} \) \( (\mathcal {C}) \) returns whether the clause set \( \mathcal {C} \) is satisfiable and a model \( \sigma \) if so. To this end, the clause set \( \mathcal {C}^p_{AB} \) in Algorithm 1 refers to our PCP problem defined in Equation (15). We further note that the growth of the number of integer partitions is subexponential, and so is the complexity Algorithm 1. A more precise complexity analysis of Algorithm 1 is the subject of future investigations.

Finally, based on Theorem 4.1 and on the property that the number of integer partitions of a given integer is finite, we obtain the following result.

Theorem 4.2.

Algorithm 1 is sound, and complete with respect to recurrence systems of size s.

The precise characterization of non-parameterized loops by non-parameterized C-finite recurrence systems implies soundness and completeness of our approach for non-parameterized loops from Theorem 4.2. In fact, a recurrence system of size s generated by Algorithm 1 gives rise to a non-parameterized loop with s variables and at most \( s-1 \) auxiliary variables where the auxiliary variables capture the values of the program variables of previous loop iterations.

Example 4.1.

We showcase Algorithm 1 by synthesizing a loop from the loop invariant \( {x=2y} \). In other words, the polynomial is given by \( {p = x-2y \in \mathbb {K}[x,y]} \), and we want to find a recurrence system of the following form (16) \( \begin{equation} \begin{pmatrix} x(n+1) \\ y(n+1) \end{pmatrix}= \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix} \begin{pmatrix} x(n) \\ y(n) \end{pmatrix} \qquad \qquad \begin{pmatrix} x(0) \\ y(0) \end{pmatrix}= \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}. \end{equation} \)

The characteristic polynomial of B is then given by \( \begin{equation*} \chi _B(\omega) = \omega ^2 - b_{11}\omega - b_{22}\omega - b_{12}b_{21} + b_{11}b_{22}, \end{equation*} \) where its roots define the closed form system. Since we cannot determine the actual roots of \( \chi _B(\omega) \), we have to fix a set of symbolic roots. The characteristic polynomial has two—not necessarily distinct—roots: either \( \chi _B(\omega) \) has two distinct roots \( \omega _1,\omega _2 \) with multiplicities \( {m_1=m_2=1} \) or a single root \( \omega _1 \) with multiplicity \( {m_1=2} \). Let us consider the latter case. The first clause set we define is \( \mathcal {C}_\mathsf {roots} \) for ensuring that B is invertible (i.e., \( \omega _1 \) is non-zero), and that \( \omega _1 \) is indeed a root of the characteristic polynomial with multiplicity 2. In other words, \( \chi _B(\omega) = (\omega - \omega _1)^2 \) has to hold for all \( \omega \in \mathbb {K} \), and bringing everything to one side yields \( \begin{equation*} (b_{11} + b_{22} - 2\omega _1) \omega + b_{12}b_{21} - b_{11}b_{22} + \omega _1^2 = 0. \end{equation*} \) We then get the following clause set: \( \begin{equation*} \mathcal {C}_\mathsf {roots}= \lbrace b_{11} + b_{22} - 2\omega _1 = 0, b_{12}b_{21} - b_{11}b_{22} + \omega _1^2 = 0, \omega _1\ne 0 \rbrace . \end{equation*} \) As we fixed the symbolic roots, the general closed form system is of the form (17) \( \begin{equation} \begin{pmatrix} x(n) \\ y(n) \end{pmatrix} = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \omega _1^n + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix} \omega _1^n n. \end{equation} \)

By substituting into the recurrence system, we get \( \begin{equation*} \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \omega _1^{n+1} + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix} \omega _1^{n+1} (n+1) = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix} \left(\begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \omega _1^n {+} \begin{pmatrix} d_1 \\ d_2 \end{pmatrix} \omega _1^n n \right). \end{equation*} \)

By further simplifications and re-ordering of terms, we then obtain \( \begin{align*} 0 = \begin{pmatrix} c_1\omega _1+d_1\omega _1 - b_{11}c_1-b_{12}c_2 \\ c_2\omega _1+d_2\omega _1 - b_{21}c_1-b_{22}c_2 \end{pmatrix} \omega _1^{n} + \begin{pmatrix} d_1\omega _1 - b_{11}d_1-b_{12}d_2 \\ d_2\omega _1 - b_{21}d_1-b_{22}d_2 \end{pmatrix} \omega _1^{n} n. \end{align*} \)

Since this equation has to hold for \( n\in \mathbb {N}, \) we get the following clause set: \( \begin{align*} \mathcal {C}_\mathsf {coeff}= \lbrace &c_1\omega _1+d_1\omega _1 - b_{11}c_1-b_{12}c_2=0, c_2\omega _1+d_2\omega _1 - b_{21}c_1-b_{22}c_2=0, \\ &d_1\omega _1 - b_{11}d_1-b_{12}d_2=0, d_2\omega _1 - b_{21}d_1-b_{22}d_2=0 \rbrace . \end{align*} \)

For defining the relationship between the closed forms and the initial values, we set Formula (17) with \( n=i \) to be equal to the i-th unrolling of Formula (16) for \( i=0,1 \): \( \begin{equation*} \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} \qquad \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \omega _1 + \begin{pmatrix} d_1 \\ d_2 \end{pmatrix} \omega _1 = \begin{pmatrix} b_{11} & b_{12} \\ b_{21} & b_{22} \end{pmatrix} \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}. \end{equation*} \)

The resulting constraints for defining the initial values are then given by \( \begin{align*} \mathcal {C}_\mathsf {init}= \lbrace &c_1-a_1=0, c_1\omega _1 + d_1\omega _1 - b_{11}a_1 - b_{12}a_2 = 0,\\ &c_2-a_2=0, c_2\omega _1 + d_2\omega _1 - b_{21}a_1 - b_{22}a_2 = 0 \rbrace . \end{align*} \)

Eventually, we want to restrict the solutions such that \( x-2y=0 \) is an algebraic relation for our recurrence system. In other words, by substituting the closed forms into the expression \( {x(n)-2y(n)=0,} \) we get \( \begin{align*} 0 &= x(n) - 2y(n) = c_1\omega _1^n + d_1\omega _1^nn - 2(c_2\omega _1^n + d_2\omega _1^nn)\\ &= \underbrace{\left(c_1-2c_2\right)\omega _1^n}_{q_0} + \underbrace{\left(\left(d_1-2d_2 \right)\omega _1^n\right)}_{q_1} n, \end{align*} \)

where \( q_0 \) and \( q_1 \) have to be 0 since the preceding equation has to hold for all \( n\in \mathbb {N} \). Then, by applying Lemma 4.1 to \( q_0 \) and \( q_1 \), we get the following clauses: \( \begin{equation*} \mathcal {C}_\mathsf {alg}= \lbrace c_1-2c_2=0, d_1-2d_2=0 \rbrace . \end{equation*} \)

Our PCP is then the union of \( \mathcal {C}_\mathsf {roots} \), \( \mathcal {C}_\mathsf {coeff} \), \( \mathcal {C}_\mathsf {init} \), and \( \mathcal {C}_\mathsf {alg} \). Two possible solutions for our PCP, and therefore of the synthesis problem, are given by the following loops: \( \begin{equation*} (x, y) \gets (2, 1)\\ {\bf {while}}~true~{\bf {do}}\\ \quad(x, y) \gets (x+2, y+1)\\ {\bf {end}}\\ \qquad\quad (x, y) \gets (2, 1)\\ {\bf {while}}~true~{\bf {do}}\\ \quad(x, y) \gets (2x, 2y)\\ {\bf {end}}. \end{equation*} \)

Note that both of the preceding loops have mutually independent affine updates. Yet the second one induces geometric sequences and requires handling exponentials of \( 2^n \).

The completeness in Theorem 4.2 is relative to systems of size \( s, \) which is a consequence of the fact that we synthesize first-order recurrence systems. In other words, there exists a system of recurrence equations of order \( {}\gt 1 \) and size s with an algebraic relation \( p\in \mathbb {K}[x_1,\dots ,x_s] \), but there exists no first-order system of size s where p is an algebraic relation. Therefore, to be able to synthesize a system satisfying p as an algebraic relation, we have to look for recurrence systems of size \( \gt s \). Example 4.2 discusses an instance of such a higher-order recurrence system. To gain completeness in Theorem 4.2 independent of the size s of the recurrence system, we would need to be able to derive an upper bound on s. Obtaining such a bound from p seems to be highly non-trivial with connections to the inverse problem of difference Galois theory [34] and is subject of future work.

Example 4.2.

Consider the loop given in Remark 3.2. This loop has \( {2y^3-3y^2+y-6x=0} \) as a loop invariant, as inferred for example by our previous work [16].

Even though the loop assignment for x in the example from Remark 3.2 is non-linear, the number sequences induced by x and y respectively satisfy the C-finite recurrence equations \( \begin{align*} x(n+4) - 4x(n+3) + 6x(n+2) - 4x(n+1) + x(n) = 0\\ y(n+2) - 2y(n+1) + y(n) = 0 \end{align*} \) with the following initial values of the loop variables x and y: \( \begin{align*} x(0) &= &&= 0\\ x(1) &= x(0) + y(0)^2 &&= 0 &\qquad \qquad \qquad y(0) &= &&= 0\\ x(2) &= x(1) + y(1)^2 &&= 1 &\qquad y(1) &= y(0) + 1 &&= 1. \\ x(3) &= x(2) + y(2)^2 &&= 5 \end{align*} \) We therefore have a C-finite recurrence system of order 4 that is equivalent to the first-order system of recurrence equations \( \begin{equation*} \begin{pmatrix} u_0(n+1) \\ u_1(n+1) \\ u_2(n+1) \\ u_3(n+1) \end{pmatrix}= \begin{pmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & -4 & 6 & -4 \end{pmatrix} \begin{pmatrix} u_0(n) \\ u_1(n) \\ u_2(n) \\ u_3(n) \end{pmatrix} \qquad \qquad \begin{pmatrix} v_0(n+1) \\ v_1(n+1) \end{pmatrix}= \begin{pmatrix} 0 & 1 \\ 1 & -2 \end{pmatrix} \begin{pmatrix} v_0(n) \\ v_1(n) \end{pmatrix} \end{equation*} \) with initial values \( {u_i(0)=x(i)} \) and \( v{_i(0) = y(i)} \). Then, \( u_0 \) and \( v_0 \) induce the sequences x and \( y, \) respectively. However, if we want to synthesize a loop that induces these number sequences x and y and has \( {2y^3-3y^2+y-6x=0} \) as a loop invariant, we have to consider first-order recurrence systems of size 6 and not only of order 4.

4.3 Synthesizing Parameterized Loops

We now extend the loop synthesis approach from Section 4.2 to an algorithmic approach synthesizing parameterized loops—that is, loops that satisfy a loop invariant for arbitrary input values. Let us first consider the following example motivating the synthesis problem of parameterized loops.

Example 4.3.

We are interested to synthesize a loop implementing Euclidean division over \( x,y\in \mathbb {K} \). Following the problem specification of Knuth [23],2 a synthesized loop performing Euclidean division satisfies the polynomial invariant \( {p = \bar{x} - \bar{y}q - r = 0} \), where \( \bar{x} \) and \( \bar{y} \) denote the initial values of x and y before the loop. It is clear that the synthesized loop should be parameterized with respect to \( \bar{x} \) and \( \bar{y} \). With this setting, input to our synthesis approach is the invariant \( {p = \bar{x} - \bar{y}q - r = 0} \). A recurrence system performing Euclidean division and therefore satisfying the algebraic relation \( \bar{x} - \bar{y}q - r \) is then given by \( X_{n+1} = BX_n \) and \( X_0 = A \) with a corresponding closed form system \( X_n = A + Cn, \) where \( \begin{equation*} X_n = \begin{pmatrix}x(n) \\ r(n) \\ q(n) \\ y(n) \\ t(n) \end{pmatrix} \quad A = \begin{pmatrix}\bar{x} \\ \bar{x} \\ 0 \\ \bar{y} \\ 1 \end{pmatrix} \quad B = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & -1 & 0 \\ 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \end{pmatrix} \quad C = \begin{pmatrix} 0 \\ -\bar{y} \\ 1 \\ 0 \\ 0 \end{pmatrix}. \end{equation*} \)

Here, the auxiliary variable t plays the role of the constant 1, and x and y induce constant sequences. When compared to non-parameterized C-finite systems/loops, note that the coefficients in the preceding closed forms, as well as the initial values of variables, are functions in the parameters \( \bar{x} \) and \( \bar{y} \).

Example 4.3 illustrates that the parameterization has the effect that we have to consider parameterized closed forms and initial values. For non-parameterized loops, we have that the coefficients in the closed forms are constants, whereas for parameterized systems, we have that the coefficients are functions in the parameters—the symbolic initial values of the sequences. In fact, we have linear functions since the coefficients are obtained by solving a linear system (see Example 2.2).

As already mentioned, the parameters are a subset of the symbolic initial values of the sequences. Therefore, let \( {I = \lbrace k_1,\dots ,k_r\rbrace } \) be a subset of the indices \( \lbrace 1,\dots ,s\rbrace \). We then define \( \bar{X} = \begin{pmatrix} \bar{x}_{k_1} & \cdots & \bar{x}_{k_r} & 1 \end{pmatrix}^\intercal , \) where \( \bar{x}_{k_1},\dots ,\bar{x}_{k_r} \) denote the parameters. Then, instead of (9), we get (18) \( \begin{equation} X_{n+1} = BX_n \qquad X_0 = A\bar{X} \end{equation} \) as the implicit representation of our recurrence system where the entries of \( {A\in \mathbb {K}^{s\times r+1}} \) are defined as \( \begin{equation*} a_{ij} = {\left\lbrace \begin{array}{ll} 1 & i = k_j \\ a_{ij}~\text{symbolic} & i \notin I \\ 0 & \text{otherwise} \end{array}\right.} \end{equation*} \)

and, as before, we have \( B\in \mathbb {K}^{s\times s} \). Intuitively, the complex looking construction of A makes sure that we have \( x_i(0)=\bar{x}_i \) for \( i\in I \).

Example 4.4.

For the vector \( {X_0 = (x_1(0) \quad x_2(0) \quad x_3(0))^\intercal } \), the set \( {I=\lbrace 1,3\rbrace } \) and therefore \( {\bar{X} = (\bar{x}_1 \quad \bar{x}_3 \quad 1)^\intercal } \), we get the following matrix: \( \begin{equation*} A = \begin{pmatrix} 1 & 0 & 0 \\ a_{21} & a_{22} & a_{23} \\ 0 & 1 & 0 \end{pmatrix}. \end{equation*} \)

Thus, \( x_1(0) \) and \( x_3(0) \) are set to \( \bar{x}_1 \) and \( \bar{x}_3 \), respectively, and \( x_2(0) \) is a linear function in \( \bar{x}_1 \) and \( \bar{x}_3 \).

In addition to the change in the representation of the initial values, we also have a change in the closed forms. In other words, instead of (11), we get \( \begin{equation*} X_n = \sum _{i=1}^t \sum _{j=1}^{m_i} C_{ij}\bar{X} \omega _i^n n^{j-1} \end{equation*} \) as the general form for the closed form system with \( C_{ij}\in \mathbb {K}^{s\times r+1} \). Then, \( \mathcal {C}_\mathsf {roots} \), \( \mathcal {C}_\mathsf {init} \), \( \mathcal {C}_\mathsf {coeff} \), and \( \mathcal {C}_\mathsf {alg} \) are defined analogously to Section 4.2, and similar to the non-parameterized case, we define \( \mathcal {C}^p_{AB}(\boldsymbol {m},\boldsymbol {\bar{x}}) \) as the union of those clause sets. The polynomials in \( \mathcal {C}^p_{AB}(\boldsymbol {m},\boldsymbol {\bar{x}}) \) are then in \( \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c},\boldsymbol {\bar{x}}] \). Then, for each \( \boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c} \in \mathbb {K} \) satisfying the clause set for all \( \boldsymbol {\bar{x}}\in \mathbb {K} \) gives rise to the desired parameterized loop—that is, we have to solve an \( \exists \forall \) problem. However, since all constraints containing \( \boldsymbol {\bar{x}} \) are polynomial equality constraints, we (repeatedly) apply Theorem 2.1 and get the following: let \( {p\in \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c},\boldsymbol {\bar{x}}]} \) be a polynomial such that \( {p= p_1 q_1 + \dots + p_k q_k} \) with \( {p_i\in \mathbb {K}[\boldsymbol {\bar{x}}]} \) and \( q_i \) monomials in \( \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}] \). Then, Theorem 2.1 implies that the \( q_i \) have to be 0.

We therefore define the following operator \( \operatorname{split}_{\boldsymbol {x}}(p) \) for collecting the coefficients of all monomials in \( \boldsymbol {x} \) in the polynomial p: let p be of the form \( q_0 + q_1x + \cdots + q_kx^k \), P a clause, and let \( \mathcal {C} \) be a clause set, then \( \begin{align*} \operatorname{split}_{\boldsymbol {y},x}(p) &= {\left\lbrace \begin{array}{ll} \lbrace q_0=0,\dots ,q_k=0\rbrace &\text{if $\boldsymbol {y}$ is empty} \\ \operatorname{split}_{\boldsymbol {y}}(q_0)\cup \cdots \cup \operatorname{split}_{\boldsymbol {y}}(q_k) &\text{otherwise} \end{array}\right.}\\ \operatorname{split}_{\boldsymbol {y}}(P) &= {\left\lbrace \begin{array}{ll} \operatorname{split}_{\boldsymbol {y}}(p) &\text{if $P$ is a unit clause $p=0$} \\ \lbrace P\rbrace &\text{otherwise} \end{array}\right.}\\ \operatorname{split}_{\boldsymbol {y}}(\mathcal {C}) &= \bigcup _{P\in \mathcal {C}} \operatorname{split}_{\boldsymbol {y}}(P). \end{align*} \)

Example 4.5.

Applying the operator \( \operatorname{split}_x \) to the clause set \( \begin{align*} \mathcal {C}= \big \lbrace & yzx^3+y^3x^2+(z^2-3)x=0,\\ &x-1=0 \vee y-1=0 \vee z-1=0,\\ & z\ne y \big \rbrace \end{align*} \) yields \( \begin{align*} \operatorname{split}_x(\mathcal {C}) = \big \lbrace &yz=0, y^3=0, z^2-3=0,\\ &x-1=0 \vee y-1=0 \vee z-1=0,\\ &z\ne y \big \rbrace . \end{align*} \)

Note that the polynomials occurring in the constraint problem \( \operatorname{split}_{\boldsymbol {\bar{x}}}(\mathcal {C}^p_{AB}(\boldsymbol {m},\boldsymbol {\bar{x}})) \) are elements of \( \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}] \). In other words, \( {\operatorname{split}_{\boldsymbol {\bar{x}}}(\mathcal {C}^p_{AB}(\boldsymbol {m},\boldsymbol {\bar{x}})) \sqsubset \mathbb {K}[\boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}]} \). Moreover, for \( {p\in \mathbb {K}[x_1,\dots ,x_s,y_1,\dots ,y_s]} \), matrices \( A,B \) and \( \bar{X} \) as in (18), and an integer partition \( {m_1,\dots ,m_t} \) of \( \deg _\omega (\chi _B(\omega)), \) we get the following theorem.

Theorem 4.3.

The map \( {\sigma :\lbrace \boldsymbol {\omega },\boldsymbol {a},\boldsymbol {b},\boldsymbol {c}\rbrace \rightarrow \mathbb {K}} \) is a solution of \( {\operatorname{split}_{\boldsymbol {\bar{x}}}(\mathcal {C}^p_{AB}(\boldsymbol {m},\boldsymbol {\bar{x}}))} \) if and only if \( p(\boldsymbol {x},x_1(0),\dots ,x_s(0)) \) is an algebraic relation for \( X_{n+1} = \sigma (B) X_n \) with \( X_0 = \sigma (A)\bar{X} \), and \( \sigma (\omega _1),\dots ,\sigma (\omega _t) \) are the eigenvalues of \( \sigma (B) \) with multiplicities \( m_1,\dots ,m_t \).

Theorem 4.3 gives rise to an algorithm analogous to Algorithm 1. Furthermore, we get an analogous soundness and completeness result as in Theorem 4.2 that implies soundness and completeness for parameterized loops.

Example 4.6.

We illustrate the construction of the constraint problem for Example 4.3. For reasons of brevity, we consider a simplified system where the variables r and x are merged. The new invariant is then \( \bar{r} = \bar{y}q + r \) and the parameters are given by \( \bar{r} \) and \( \bar{y} \). In other words, we consider a recurrence system of size 4 with sequences y, \( q, \) and r, and t for the constant 1. As a consequence, we have that the characteristic polynomial B is of degree 4, and we fix the symbolic root \( \omega _1 \) with multiplicity 4. For simplicity, we only show how to construct the clause set \( \mathcal {C}_\mathsf {alg} \).

With the symbolic roots fixed, we get the following template for the system of closed form solutions: let \( \begin{equation*} X_n = (r(n) \quad q(n) \quad y(n) \quad t(n))^\intercal \qquad \text{and}\qquad V = (\bar{r} \quad \bar{y} \quad 1)^\intercal , \end{equation*} \) and let \( {C,D,E,F\in \mathbb {K}^{4\times 3}} \) be symbolic matrices. Then, the closed form is given by \( \begin{align*} X_n &= (CV + DVn + EVn^2 + FVn^3) \omega _1^n, \end{align*} \) and for the initial values, we get \( \begin{align*} X_0 &= \begin{pmatrix} 1 & 0 & 0 \\ a_{21} & a_{22} & a_{23} \\ 0 & 1 & 0 \\ a_{41} & a_{42} & a_{43} \end{pmatrix}V. \end{align*} \)

By substituting the closed forms into the invariant \( {r(0) - y(0) q(n) - r(n) = 0} \) and rearranging, we get \( \begin{align*} 0 = \bar{r} &- \left(c_{21}\bar{r}\bar{y} - c_{22}\bar{y}^2 - c_{23}\bar{y} - c_{11}\bar{r} - c_{12}\bar{y} - c_{13}\right) \omega _1^n \\ &- \left(d_{21}\bar{r}\bar{y} + d_{22}\bar{y}^2 + d_{23}\bar{y} - d_{11}\bar{r} - d_{12}\bar{y} - d_{13}\right) \omega _1^n n \\ &- \left(e_{21}\bar{r}\bar{y} + e_{22}\bar{y}^2 + e_{23}\bar{y} - e_{11}\bar{r} - e_{12}\bar{y} - e_{13}\right) \omega _1^n n^2 \\ &- \left(f_{21}\bar{r}\bar{y} + f_{22}\bar{y}^2 + f_{23}\bar{y} - f_{11}\bar{r} - f_{12}\bar{y} - f_{13}\right) \omega _1^n n^3. \end{align*} \) Since the preceding equation should hold for all \( n\in \mathbb {N}, \) we get \( \begin{align*} \left(\bar{r}\right) 1^n - \left(c_{21}\bar{r}\bar{y} - c_{22}\bar{y}^2 - c_{23}\bar{y} - c_{11}\bar{r} - c_{12}\bar{y} - c_{13}\right) \omega _1^n &= 0 \\ \left(d_{21}\bar{r}\bar{y} + d_{22}\bar{y}^2 + d_{23}\bar{y} - d_{11}\bar{r} - d_{12}\bar{y} - d_{13}\right) \omega _1^n &= 0 \\ \left(e_{21}\bar{r}\bar{y} + e_{22}\bar{y}^2 + e_{23}\bar{y} - e_{11}\bar{r} - e_{12}\bar{y} - e_{13}\right) \omega _1^n &= 0\\ \left(f_{21}\bar{r}\bar{y} + f_{22}\bar{y}^2 + f_{23}\bar{y} - f_{11}\bar{r} - f_{12}\bar{y} - f_{13}\right) \omega _1^n &= 0. \end{align*} \)

Then, by applying Lemma 4.1, we get \( \begin{align*} \bar{r} - \left(c_{21}\bar{r}\bar{y} - c_{22}\bar{y}^2 - c_{23}\bar{y} - c_{11}\bar{r} - c_{12}\bar{y} - c_{13}\right) &= 0 \\ \bar{r} - \left(c_{21}\bar{r}\bar{y} - c_{22}\bar{y}^2 - c_{23}\bar{y} - c_{11}\bar{r} - c_{12}\bar{y} - c_{13}\right) \omega _1 &= 0 \\ d_{21}\bar{r}\bar{y} + d_{22}\bar{y}^2 + d_{23}\bar{y} - d_{11}\bar{r} - d_{12}\bar{y} - d_{13} &= 0\\ e_{21}\bar{r}\bar{y} + e_{22}\bar{y}^2 + e_{23}\bar{y} - e_{11}\bar{r} - e_{12}\bar{y} - e_{13} &= 0\\ f_{21}\bar{r}\bar{y} + f_{22}\bar{y}^2 + f_{23}\bar{y} - f_{11}\bar{r} - f_{12}\bar{y} - f_{13} &= 0. \end{align*} \)

Finally, by applying the operator \( \operatorname{split}_{\bar{y},\bar{r}} \), we get the following constraints for \( \mathcal {C}_\mathsf {alg} \): \( \begin{align*} c_{21} &={ }& 1-c_{11} &={ }& c_{22} &={ }& c_{23} + c_{12} &={ }& c_{13} &= 0 \\ \omega _1c_{21} &={ }& 1-\omega _1c_{11} &={ }& \omega _1c_{22} &={ }& \omega _1\left(c_{23} + c_{12}\right) &={ }& \omega _1c_{13} &= 0 \\ d_{21} &={ }& d_{11} &={ }& d_{22} &={ }& d_{23} + d_{12} &={ }& d_{13} &= 0 \\ e_{21} &={ }& e_{11} &={ }& e_{22} &={ }& e_{23} + e_{12} &={ }& e_{13} &= 0 \\ f_{21} &={ }& f_{11} &={ }& f_{22} &={ }& f_{23} + f_{12} &={ }& f_{13} &= 0. \end{align*} \)

Skip 5AUTOMATING ALGEBRA-BASED LOOP SYNTHESIS Section

5 AUTOMATING ALGEBRA-BASED LOOP SYNTHESIS

For automating Algorithm 1 for loop synthesis, the challenging task is to find solutions for our PCPs describing large systems of polynomial constraints with many variables and high polynomial degrees (see the PCP problem given in (15)). We propose the following (partial) solutions for optimizing and exploring the PCP solution space. For handling recurrence templates of large size, we utilize the flexibility of our approach by trying to find loops of a specific shape first and then generalize if needed. Moreover, for dealing with PCPs containing polynomials of high degree, we leverage properties of the constraints generated by our procedure. We will see in our experimental evaluation in Section 6 that the former is useful in loop synthesis and the latter is necessary for the synthesis of number sequences in a mathematical setting.

Handling large recurrence templates. It is obvious that the higher the number of program variables in the loop to be synthesized is, the higher is the number of variables in the PCP of Algorithm 1. To face this increase of complexity, we implemented an iterative search for PCP solutions in the sense that we preset certain values of the coefficient matrix B in (9). In particular, we start by looking for PCP solutions where the coefficient matrix B is unit upper triangular. If no such solution is found, we consider B to be an upper triangular matrix and further to be full symbolic matrix without preset values. This way, we first construct simpler PCPs (in terms of the number of variables) and generalize step by step, if needed. This iterative approach can also be used to the search for only integer PCP solutions by imposing/presetting B to contain only integer-valued.

Synthesizing a (unit) upper triangular coefficient matrix B yields a loop where its loop variables are not mutually dependent on each other. We note that such a pattern is a very common programming paradigm—all benchmarks from Table 1 satisfy such a pattern. Yet as a consequence of restricting the shape of B, the order of the variables in the recurrence system matters. In other words, we have to consider all possible variable permutations for ensuring completeness with respect to (unit) upper triangular matrices.

Table 1.
InstancesidcYicesZ3Z3*Z3* + Alg2
unupfuunupfuunupfuunupfu
add1*515173932921-117--22726-72416-
add2*515173959861-115--22109-72323-
cubes53694---116114-1849657587--
double131429114112388211311111313216335120
double23132411010616651151061151318402521
eucliddiv*515185213537-114115-1973-102554-
intcbrt*5212262---117116-228346989--
intsqrt142653---1131081141519-3581-
intsqrt2*4161041051164-11311111515273739-
petter131429112116-11411311315183215323629
square31429112112-1121141171317261029592
dblsquare31430109105-1051051101217261431-
sum142653617--10811211317249939250-
sum253682---220112-20516-60--
  • s, Size of the recurrence system;

    i, number of polynomial invariants;

    d, maximum monomial degree of constraints;

    c, number of constraints;

  • *, parameterized system;

    -, timeout (60 seconds).

Table 1. Absynth Results for Loop Synthesis (Results in Milliseconds) by Reverse Engineering Examples from the State of the Art in Polynomial Invariant Generation

  • s, Size of the recurrence system;

    i, number of polynomial invariants;

    d, maximum monomial degree of constraints;

    c, number of constraints;

  • *, parameterized system;

    -, timeout (60 seconds).

Handling large polynomial degrees. The main source of polynomials with high degrees in the PCP of Algorithm 1 stems from the clause set \( \mathcal {C}_\mathsf {alg} \)—that is, constraints of the form (14) for \( n\in \lbrace 0,\dots ,\ell -1\rbrace \). For any PCP solution \( \sigma \) in line 4 of Algorithm 1, we have \( {\sigma (w_{1})^n\sigma (u_{1}) + \cdots + \sigma (w_{{\ell }})^n \sigma (u_{{\ell }})=0} \) for \( {n\in \lbrace 0,\dots ,\ell -1\rbrace } \), which yields the following system of linear equations: (19) \( \begin{equation} W\boldsymbol {u}= \begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\ \sigma (w_1) & \sigma (w_2) & \sigma (w_3) & \cdots & \sigma (w_\ell) \\ \sigma (w_1)^2 & \sigma (w_2)^2 & \sigma (w_3)^2 & \cdots & \sigma (w_\ell)^2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \sigma (w_1)^{\ell -1} & \sigma (w_2)^{\ell -1} & \sigma (w_3)^{\ell -1} & \cdots & \sigma (w_\ell)^{\ell -1} \\ \end{bmatrix} \begin{bmatrix} \sigma (u_1) \\ \sigma (u_2) \\ \sigma (u_3) \\ \vdots \\ \sigma (u_\ell) \end{bmatrix}=0, \end{equation} \) where \( {W\in \mathbb {K}^{\ell \times \ell }} \) is a Vandermonde matrix and \( {\boldsymbol {u}\in \mathbb {K}^\ell } \). Suppose our assignment \( \sigma \) in line 4 of Algorithm 1 is such that \( \sigma (w_i)\ne \sigma (w_j) \) for \( i\ne j \); if this is not the case, we can always create a smaller system of the form (19) by collecting terms. As \( \sigma (w_i)\ne \sigma (w_j) \) for \( i\ne j \), we derive that W is invertible. Then, it follows by Cramer’s rule that \( \sigma (u_i)=0 \) for all \( i\in \lbrace 1,\dots ,\ell \rbrace \). Based on this observation, we propose Algorithm 2 for solving constraints of the form (14). For simplicity, we only present the case where we have a single constraint of the form (14); Algorithm 2, however, naturally extends to multiple such constraints.

Intuitively, step (1) of Algorithm 2 finds a model \( \sigma \) such that each \( u_i \) becomes zero, which makes the values of the \( w_i \) irrelevant. To this end, we compute maximum satisfiability of our constraints C using the MaxSAT approach of Narodytska and Bacchus [28]. If this is not possible, we continue with a partition \( {\mathcal {I} = \lbrace I_1,\dots ,I_\ell \rbrace } \) of the set of indices \( \lbrace 1,\dots ,\ell \rbrace \). Then, \( \mathcal {I} \) induces a system of linear equations of the form (19) of size \( \ell \) that is specified in step (2) of the algorithm. If the PCP is satisfiable (step (3) of Algorithm 2), then we have found an assignment \( \sigma \) that satisfies \( \mathcal {P} \) and the system of the form (19) for C. If the given PCP is unsatisfiable, then we learn a new partition by making use of the unsatisfiable core and go back to step (2) of Algorithm 2.

Skip 6IMPLEMENTATION AND EXPERIMENTS Section

6 IMPLEMENTATION AND EXPERIMENTS

6.1 Absynth: Algebra-Based Loop Synthesis

Our approach to algebra-based loop synthesis is implemented in the tool Absynth, which consists of about 1,800 lines of Julia code and is available at https://github.com/ahumenberger/Absynth.jl. Inputs to Absynth are conjunctions of polynomial equality constraints, representing a loop invariant. As a default result, Absynth derives a program that is partially correct with respect to the given invariant (e.g., see Table 1). In addition, Absynth can also be used to derive number sequences for which the given invariant is an algebraic relation (Table 2).

Table 2.
InstanceoYicesZ3Z3*Z3* + Alg2
fibonacci12---324
fibonacci22---22
example282---41
ex12---27
ex22---20
ex32---451
  • o, order of recurrence;

    -, timeout (60 seconds).

Table 2. Absynth Results for Recurrence Synthesis (Results in Milliseconds)

  • o, order of recurrence;

    -, timeout (60 seconds).

As described in Section 4, loop synthesis in Absynth is reduced to solving PCPs. These PCPs are currently expressed in the quantifier-free fragment of non-linear real arithmetic (QF_NRA). We used Absynth in conjunction with the SMT solvers Yices [10] (version 2.6.1) and Z3 [9] (version 4.8.6) for solving the PCPs and therefore synthesizing loops. For instance, Figure 1(b) and (c) and Example 4.1 are synthesized automatically using Absynth.

As PCPs in Absynth are restricted to QF_NRA, the implementation of Algorithm 1 within Absynth does not yet find solutions containing non-real algebraic numbers. In our loop synthesis experiments, we did not encounter instances where non-real algebraic numbers are necessary. The synthesis of recurrences, however, often requires reasoning about non-real algebraic numbers such as the so-called Perrin numbers \( p(n) \) defined via \( p(n+3)=p(n+1)+p(n) \) and satisfying the relation \( p(n)^3-3p(n)p(2n)+2p(3n) = 6 \). Going beyond the QF_NRA fragment and considering finite domains (bitvectors/bounded integers) within Absynth is a next step to investigate.

6.2 Experimental Results with Synthesizing Loops and Recurrences

Our experiments in this article were performed on a machine with a 2.9-GHz Intel Core i5 and 16 GB of LPDDR3 RAM, and for each benchmark, a timeout of 60 seconds was set.

Tables 1 and 2 summarize our initial experiments using academic benchmarks from the invariant generation literature [16] and recurrence solving [20]. The columns Yices and Z3 correspond to the results where the respective solver is called as an external program with and SMTLIB 2.0 file as input; as such, Yices and Z3 are used as black-box solvers, with additional external calls on the PCP problems generated by Absynth as SMTLIB constraints. To reduce the overhead with external calls on solving PCP constraints, as well as to implement our PCP optimizations measures from Section 5 as guiding technologies for SMT solving over PCPs, we extended Absynth with SMT solving approaches tailored toward PCP reasoning. To this end, we implemented the optimizations from Section 5 as part of Absynthby extending and integrating Z3 within Absynth. Column Z3* refers to the use of Absynth with Z3 integrated by directly integrating Absynth with the (C++ API) interface of Z3. Finally, column Z3* + Alg2 depicts the results for Algorithm 2 with Z3* as backend solver. The results in Tables 1 and 2 are given in milliseconds and only include the time needed for solving the constraint problem, as the time needed for constructing the constraints is neglectable.

Loop synthesis. Our first benchmark set for loop synthesis consists of invariants for loops from the invariant generation literature [16] and is reported in Table 1. Note that the benchmarks cubes and double2 in Table 1 are those from Figure 1 and Example 4.1, respectively. A further presentation of a selected set of our benchmarks from Table 1 is given in Section 6.3 using the Absynth input language.

The columns un and up in Table 1 show the results where the coefficient matrix B is restricted to be unit upper triangular and upper triangular, respectively. fu indicates that no restriction on B was set. Note that the running time of Algorithm 1 heavily depends on the order of which the integer partitions and the variable permutations are traversed. Therefore, to get comparable results, we fixed the integer partition and the variable permutation. In other words, for each instance, we enforced that B in formula (9) has just a single eigenvalue, and we fixed a variable ordering where we know that there exists a solution with an unitriangular matrix B. Hence, there exists at least one solution that all cases—un, up, and fu—have in common. Furthermore, for each instance, we added constraints for avoiding trivial solutions (i.e., loops inducing constant sequences) and used Algorithm 2 to further reduce our search space. Table 1 shows that with these considerations on top of our optimizations from Section 5, Z3* outperforms Yices. In other words, our tailored use of Z3 directly integrated within Absynth brings the best performance in loop synthesis. A similar experimental conclusion can be drawn also from our experiments in using Absynth for synthesizing number sequences, as detailed next.

Recurrence synthesis. In addition to loop synthesis, we also conducted experiments with respect to synthesizing recurrence equations (Table 2). We took algebraic relations from Kauers and Zimmermann [21] and synthesized recurrence equations satisfying the given relations. None of the instances could be solved by Yices or Z3, but only by Z3* + Alg2. Based on our results from Tables 1 and 2, we conclude that using Z3 as an integrated backend solver of Absynth—that is, Z3* in Absynth—is the right approach to take for further applications of synthesizing loops with non-linear polynomial invariants. We discuss further use cases of loop synthesis based on Absynth with Z3* in Section 7.

In contrast to loop synthesis, we note that the synthesis of recurrence equations often requires reasoning about non-real algebraic numbers that does not fall into the fragment of non-linear real arithmetic. Hence, for synthesizing recurrence equations, we plan to further extend Absynth with a dedicate solver to reason about the whole set of algebraic numbers; such reasoning is not yet supported by Z3*.

6.3 Examples of Synthesized Loops by Absynth

In Figure 3, we show a few illustrative examples used in our experiments from Table 1 using the Absynth input language. As mentioned in Section 6.2, we considered loops annotated with their invariants from the invariant generation literature [17, 31]. For each example in Figure 3, we first list such a loop (Original loop). We then give the first loop synthesized by our work (Absynth loop) in combination with Yices and Z3, respectively.3 In other words, the synthesized loops of Figure 3 are generated by Absynth from the loop invariants of Humenberger et al. [17] and Rodriguez-Carbonell and Kapur [31]. Observe that in most cases, Absynth work was able to derive the loop as in the preceding works [17, 31].

Fig. 3.

Fig. 3. Loops synthesized by Absynthby reverse engineering examples from the state of the art in polynomial invariant generation.

Skip 7BEYOND LOOP SYNTHESIS Section

7 BEYOND LOOP SYNTHESIS

As argued in in Section 1, our work has potential applications in other areas related to synthesis. In this section, we report on our experience in using to loop synthesis (i) in the context of deriving loops that are equivalent based on some metric and (ii) in the setting of teaching formal methods. We note that former use case (i) has applications toward program/compiler optimization, as discussed in the following. Our experimental results from these use cases are summarized in Table 3 and detailed next. Based on our initial experiments from Section 6 showcasing the clear benefits of using Z3* in Absynth, in this section, as well as in further applications of loop synthesis, we only deployed Absynth with Z3 as an integrated backend solver.

Table 3.
InstancesidcZ3*Z3* + Alg2
unupfuunupfu
cube_conj5239011--174--
squared_varied14124713--1906--
square_conj42254822-216168-
cube_square4136010--2316--
sum_of_square5129246--65--
squared_varied2412503169-295155-
fmi13123271322161921267
fmi2422541019-2186-
fmi3422551219-1840--
fmi43123281622133484
fmi531232714197217-
  • s, size of the recurrence system;

    i, number of polynomial invariants;

    d, maximum monomial degree of constraints;

    c, number of constraints;

  • -, timeout (60 seconds).

Table 3. Absynth Results for Using Loop Synthesis for (i) the Loop Equivalence Benchmarks of Figure 6 and in (ii) the Teaching Efforts of Formal Methods from Figure 5 (Results in Milliseconds)

  • s, size of the recurrence system;

    i, number of polynomial invariants;

    d, maximum monomial degree of constraints;

    c, number of constraints;

  • -, timeout (60 seconds).

7.1 Synthesizing Equivalent Loops Modulo Invariants

Given a (potentially arbitrary) loop L with loop variables \( \boldsymbol {x} \), our work can be used to simplify L while maintaining the polynomial invariants of L. In other words, for a loop L with a polynomial relation \( p(\boldsymbol {x})=0 \) as its invariant, our approach can be used to synthesize a loop \( L^{\prime } \) such that \( L^{\prime } \) implements only affine operations over \( \boldsymbol {x} \) and \( p(\boldsymbol {x})=0 \) is an invariant of \( L^{\prime } \). In this case, we say that L and \( L^{\prime } \) are equivalent modulo the invariant \( p(\boldsymbol {x})=0 \). More generally, we define this form of equivalence between program loops as follows.

Definition 7.1

(Loop Equivalence Modulo Invariants).

Let \( {L} \) and \( {L^{\prime }} \) be two program loops with variable sets respectively denoted by \( \boldsymbol {x}_{L} \) and \( \boldsymbol {x}_{L^{\prime }} \). Let \( I(\boldsymbol {x}_{L}) \) be an invariant property of L. We say that the loops L and \( L^{\prime } \) are equivalent modulo I if \( I(f(\boldsymbol {x}_{L})) \) is an invariant property of \( L^{\prime } \) for some bijective function \( f :\boldsymbol {x}_{L} \rightarrow \boldsymbol {x}_{L^{\prime }} \).

We write \( L \equiv _{I} {L^{\prime }} \) to denote that L and \( L^{\prime } \) are equivalent modulo I.

Note that we define loop equivalence modulo an invariant I for arbitrary loops and arbitrary invariants. In other words, Definition 7.1 is not restricted to polynomial invariants, nor to program loops defined by our programming model (8).

We next argue that our loop synthesis approach can be used to generate loops that are equivalent modulo an invariant I as follows:

(1)

Given a loop L with variables \( \boldsymbol {x}_L \) and a polynomial relation \( p(\boldsymbol {x}_L)=0 \) as an invariant \( I(\boldsymbol {x}_L) \) of L;

(2)

Synthesize a loop \( L^{\prime } \) with variables \( \boldsymbol {x}_{L^{\prime }} \) such that \( I(\boldsymbol {x}_{L^{\prime }}) \) is an invariant of \( L^{\prime } \).

Note that in our approach to infer \( L^{\prime } \) as an equivalent loop of L modulo I, (i) we restrict our invariants I to be polynomial relations among loop variables, but (ii) we do not impose our programming model constraints (8) over L. In other words, our loop synthesis approach can be used to synthesize loops \( L^{\prime } \) that are “simpler” than an arbitrary (not necessarily single-path affine) loop L. For example, although a loop L with invariant I may contain arbitrary polynomial assignments and nested conditionals, our loop synthesis approach can be used to derive a single-path loop \( L^{\prime } \) with only affine updates such that \( L^{\prime } \) is equivalent to L modulo I. Example 7.1 showcases such a use case of our loop synthesis approach: we simplify a multi-path loop into a single-path loop while maintaining loop equivalence modulo an invariant. Such a simplification can be seen as an instance of program/ compiler optimizations, such as in the generic setting of strength reduction, as argued in Section 1.

Example 7.1.

Consider the multi-path loop of Figure 4 (a), for which \( z - x - y = 0 \) is an invariant. We write \( I(x, y, z) \) to denote the polynomial invariant \( z - x - y = 0 \) of Figure 4(a). We choose the bijective function \( f = \lbrace (z, c), (x, a), (y, b) \rbrace \) and use our loop synthesis approach to derive the single-path loop of Figure 4(b) from the invariant \( I(f(x),f(y),f(z)) \). By soundness of our loop synthesis approach (Theorem 4.2), the invariant \( I(f(x), f(y), f(z)) \) denoting \( c - a - b=0 \) is an invariant of Figure 4(b). Hence, the loops of Figure 4(a) and (b) are equivalent modulo \( {z - x - y=0} \).

Fig. 4.

Fig. 4. Examples of equivalent loops modulo invariants.

In Figure 5, we give further examples of loops synthesized by our work such that the resulting loops are equivalent modulo an invariant. The examples in Figure 5 have been constructed by us to showcase the non-linear arithmetic reasoning features of Absynth. All examples in Figure 5 are automatically synthesized by Absynthusing Z3. Further, the loops in Figure 5 are listed using the Absynth input languages, reporting further experiments made with Absynth. We note that our approach to loop equivalence exploits also the setting of parameterized loop synthesis (Section 4.3): the initial values of loop variables of the respective equivalent loops may differ, as illustrated in the examples of Figure 5. Although our loop synthesis generates only affine loops of the form (8), Figure 5 illustrates that such loops can implement a wide range of affine updates over loop variables, including dependencies among updates, while maintaining loop equivalence modulo an invariant. We finally note that, similarly to loop synthesis, loop equivalence modulo an invariant can be used in Absynth with invariants representing arbitrary Boolean combinations of polynomial invariants—for example, conjunctive polynomial invariants as given in Figure 5(a).

Fig. 5.

Fig. 5. Examples of loops generated by Absynth that are equivalent modulo an invariant. For each example, polynomial constraints have been solved in Absynth using Z3.

7.2 Loop Synthesis in Teaching Formal Methods

By further exploiting loop synthesis and its applications toward loop equivalence modulo invariant, we now detail our setting for using loop synthesis in automating some of our efforts in teaching deductive verification at TU Wien. Amid online lecturing and examinations during the COVID-19 epidemic, coming up with best practices to assess course performance was far from trivial. We faced this challenge, for example, during our “Formal Methods in Computer Science—FMI” course in the computer science master curriculum at TU Wien. In particular, we had to provide plausible solutions toward online examinations on topics of deductive verification, where students are asked to formally prove partial/total correctness assertions using Hoare logic. As such, our online exam sheets were designed by requiring solutions on (i) generating loop invariants and variants, and (ii) using invariants and variants to conduct inductive proofs about program correctness. Yet to minimize collaborative approaches toward solving online exam solutions, we also had to design an online exam setting that (i) provided individual exam sheets for each student enrolled in our course (about 350 enrolled students per study semester) while (ii) requiring the same solving workload on each participant during our online exam. By taking into account all of these constraints, we used our loop synthesis approach to generate online exam problems on proving partial correctness as follows:

(1)

We fixed a small set \( \boldsymbol {x} \) of program variables, typically using at most three variables.

(2)

We fixed a few polynomial equalities among \( \boldsymbol {x} \) by considering polynomials of degree at most 3. In particular, we used at most three such polynomial relations—that is, \( p_1(\boldsymbol {x})=0 \), \( p_2(\boldsymbol {x})=0 \), and \( p_3(\boldsymbol {x})=0 \), with each \( p_i \) of degree at most 3.

(3)

We constructed a polynomial property \( I(\boldsymbol {x}) \) as a conjunctive formula among (some of) the fixed polynomial relations \( p_i(\boldsymbol {x})=0 \).

(4)

We used loop synthesis in Absynth to generate different loops L that are equivalent modulo I. In the process of generating these loops, we only considered loops whose coefficients are given by relatively small integer/rational numbers.

(5)

We formulated exam problems by reverse engineering instances of our loop synthesis task using the loops generated by Absynth. In other words, our exam problems on proving partial correctness were generated using the following simplified template: “Given the loop L, prove that I is an inductive invariant of \( L, \)” where we set L to be a loop generated by Absynth using the invariant I.

Figure 6 showcases some loops generated by Absynth that have been used in the deductive verification exam problems of our FMI master course during 2020/2021.

Fig. 6.

Fig. 6. Loops synthesized by Absynth for online exam problems within the “Formal Methods in Computer Science—FMI” master course at TU Wien.

We note that depending on the invariants I we deployed for loop synthesis, as well as on the loop bodies of the loops L generated by Absynth, we also considered exam problems where (i) we used simple linear inequalities as loop conditions bounding the number of loop iterations by a symbolic constant N (as shown in our motivating examples from Figure 1); (ii) stated pre-conditions A and post-conditions B on L; and (iii) asked for proving partial correctness of the Hoare triple \( \lbrace A\rbrace ~L~\lbrace B\rbrace \), by requiring students to also generate the necessary invariants I.

Based on the students’ performances during our online exams, we believe that the exam problems generated by Absynth were fair challenges for students, reducing collusion among them. Moreover, despite the fact that our FMI exams were conducted online since 2020, the overall FMI grade distributions on solved online exam sheets were very similar to previous in-class examinations.

Skip 8RELATED WORK Section

8 RELATED WORK

We now compare our loop synthesis work in relation to the state of the art in program synthesis and algebraic reasoning about program loops.

Synthesis. To the best of our knowledge, existing synthesis approaches are restricted to linear invariants (e.g., see [33]), whereas our work supports loop synthesis from non-linear polynomial properties. Existing approaches to SyGuS-based synthesis [3] differ in the logical expressiveness of input specifications, restrictions of the solutions space of programs to be synthesized, and search strategies for solutions during synthesis. The key difference between our work and the state of the art in synthesis is threefold: (i) we consider logical specifications as (non-linear) polynomial invariant relations, (ii) synthesize loops with non-linear behavior, and (iii) precisely characterize the set of all such loops to be synthesized. To the best of our knowledge, existing approaches are restricted to linear invariants (e.g., see [33]), and no other approach can automatically synthesize loops from polynomial invariants. In what follows, we discuss approaches that are mostly related to our work. focusing only on works handling loops and/or recursion.

Counterexample-guided synthesis (CEGIS) [3, 11, 29, 32] refines the deductive synthesis approach of Manna and Waldinger [27] and restricts the search space of programs by using templates and/or sketches [32]. CEGIS provides an automated approach for synthesizing programs P from a given specification S [3, 29, 32]. For doing so, CEGIS uses input-output examples satisfying a specification S to synthesize a candidate program P that is consistent with the given inputs. Correctness of the candidate program P with respect to S is then checked using verification approaches, particularly using SMT-based reasoning. If verification fails, a counterexample is generated as an input to P that violates S. This counterexample is then used in conjunction with the previous set of input-outputs to revise synthesis and generate a new candidate program P. CEGIS-based approaches have been extended with machine/active learning to control the selection of (counter-)examples satisfying S and guide the synthesis of P [11]. Unlike these methods, input specifications to our approach are relational (invariant) properties describing all potentially infinite input-output examples of interest. Hence, we do not rely on interactive refinement of our input but work with a precise characterization of the set of input-output values of the program to be synthesized. Similarly to sketches [29, 32], we consider loop templates restricting the search for solutions to synthesis. Yet our templates support non-linear arithmetic (and hence multiplication), which is not yet the case in other works [11, 29]. We rely on fixed-point reasoning and properties of algebraic recurrences to encode the synthesis problem as a constraint solving task in the theory of non-linear real arithmetic. We precisely characterize the set of all programs satisfying our input specification, and as such, our approach does not exploit learning to refine program candidates. However, our programming model is more restricted than the work of Feng et al. [11] and Nye et al. [29] in various aspects: we only handle simple loops and only consider numeric data types and operations.

The programming by example approach of Gulwani [12] learns programs from input-output examples and relies on lightweight interaction to refine the specification of programs to be specified. The approach has further been extended in the work of Kalyan et al. [19] with machine learning, allowing to learn programs from just one (or even none) input-output example by using a simple supervised learning setup. Program synthesis from input-output examples is shown to be successful for recursive programs [1], yet synthesizing loops and handling non-linear arithmetic is not yet supported by this line of research. Our work does not learn programs from observed input-output examples but uses loop invariants to fully characterize the intended behavior of the program to be synthesized. We precisely characterize the solution space of loops to be synthesized by a system of algebraic recurrences without using statistical models supporting machine learning. Thanks to its precision, our method is, however, computationally more expensive and does not yet scale to large programs/datasets.

A related approach to our work is given by Darulova et al. [8], where a fixed-point implementation for an approximated real-valued polynomial specification is presented. For doing so, the method of Darulova et al. [8] uses genetic programming [30] to guide the search for synthesized programs and combines heuristic search with with abstract interpretation [7] to estimate and refine the (floating-point) error bound of the inferred fixed-point implementation. Although the underlying abstract interpreter is precise for linear expressions, precision of the synthesis is lost for non-linear arithmetic. Unlike Darulova et al. [8], we consider polynomial specification in the abstract algebra of real-closed fields and do not address challenges rising from machine reals. As a consequence, we precisely synthesize loops from polynomial invariants by inherently relying on non-linear reasoning.

Algebraic reasoning. Compared to works on invariant generation [15, 17, 22, 31], the only common aspect between these works and our synthesis method is the use of linear recurrences to capture the functional behavior of program loops. Yet our work is conceptually different from other works [15, 17, 22, 31], as we reverse engineer invariant generation and do not rely on the ideal structure/Zariski closure of polynomial invariants. We do not use ideal theory nor Gröbner bases computation to generate invariants from loops; rather, we generate loops from invariants by formulating and solving PCPs.

Skip 9CONCLUSION Section

9 CONCLUSION

We proposed SyGuS, a procedure for synthesizing loops over affine assignments from polynomial invariants. We consider loop templates and use reasoning over recurrence equations modeling the loop behavior. The key ingredient of our work comes with translating the loop synthesis problem into a PCP and showing that this constraint problem precisely captures all solutions to the loop synthesis problem. Additional heuristics for solving our constraints have been also implemented in our new tool Absynth for loop synthesis.

Directions for future work include a complexity analysis of our algorithm, further investigating the properties of our constraint problems for improving the scalability of our procedure, generalizing our approach to multi-path loops and inequality invariants, restricting the solution space to integers or bounded domains, extending Absynth with reasoning support for arbitrary algebraic numbers, and understanding and encoding the best optimization measures for loop synthesis in the context of program/compiler optimization approaches.

Skip ACKNOWLEDGMENTS Section

ACKNOWLEDGMENTS

We thank Sumit Gulwani and Manuel Kauers for valuable discussions on ideas leading to this work.

Footnotes

  1. 1 https://rise4fun.com/Dafny/.

    Footnote
  2. 2 For \( x,y\in \mathbb {K}, \) we want to compute \( q,r\in \mathbb {K} \) such that \( x = yq + r \) holds.

    Footnote
  3. 3 In Figure 3(e), Absynth with Yices failed to synthesize a loop, as indicated in Table 1.

    Footnote

REFERENCES

  1. [1] Albarghouthi Aws, Gulwani Sumit, and Kincaid Zachary. 2013. Recursive program synthesis. In Proceedings of CAV 2013. 934950. Google ScholarGoogle ScholarCross RefCross Ref
  2. [2] Alur Rajeev, Bodík Rastislav, Dallal Eric, Fisman Dana, Garg Pranav, Juniwal Garvit, Kress-Gazit Hadas, et al. 2015. Syntax-guided synthesis. In Dependable Software Systems Engineering. Vol. 40. IOS Press, 125. Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Alur Rajeev, Singh Rishabh, Fisman Dana, and Solar-Lezama Armando. 2018. Search-based program synthesis. Commun. ACM 61, 12 (2018), 8493. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Ball Thomas, Levin Vladimir, and Rajamani Sriram K.. 2011. A decade of software model checking with SLAM. Commun. ACM 54, 7 (2011), 6876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Cook Byron. 2018. Formal reasoning about the security of Amazon Web Services. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 10981. Springer, 38–47. Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Cooper Keith D., Simpson L. Taylor, and Vick Christopher A.. 2001. Operator strength reduction. ACM Trans. Program. Lang. Syst. 23, 5 (2001), 603625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Cousot Patrick and Cousot Radhia. 1977. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of POPL1977. 238252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [8] Darulova Eva, Kuncak Viktor, Majumdar Rupak, and Saha Indranil. 2013. Synthesis of fixed-point programs. In Proceedings of EMSOFT 2013. Article 22, 10 pages. Google ScholarGoogle ScholarCross RefCross Ref
  9. [9] Moura Leonardo De and Bjørner Nikolaj. 2008. Z3: An efficient SMT solver. In Proceedings of TACAS 2008. 337340. Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Dutertre Bruno. 2014. Yices 2.2. In Proceedings of CAV 2014. 737744. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. [11] Feng Yu, Martins Ruben, Bastani Osbert, and Dillig Isil. 2018. Program synthesis using conflict-driven learning. In Proceedings of PLDI 2018. 420435. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [12] Gulwani Sumit. 2011. Automating string processing in spreadsheets using input-output examples. In Proceedings of POPL 2011. 317330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Gulwani Sumit. 2016. Programming by examples: Applications, algorithms, and ambiguity resolution. In Proceedings of IJCAR 2016. 914. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Hoare C. A. R.. 1969. An axiomatic basis for computer programming. Commun. ACM 12, 10 (1969), 576580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. [15] Hrushovski Ehud, Ouaknine Joël, Pouly Amaury, and Worrell James. 2018. Polynomial invariants for affine programs. In Proceedings of LICS 2018. ACM, New York, NY, 530539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Humenberger Andreas, Bjørner Nikolaj, and Kovács Laura. 2020. Algebra-based loop synthesis. In Integrated Formal Methods. Lecture Notes in Computer Science, Vol. 12546. Springer, 440–459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Humenberger Andreas, Jaroschek Maximilian, and Kovács Laura. 2017. Automated generation of non-linear loop invariants utilizing hypergeometric sequences. In Proceedings of ISSAC 2017. ACM, New York, NY, 221228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. [18] Jha Susmit, Gulwani Sumit, Seshia Sanjit A., and Tiwari Ashish. 2010. Oracle-guided component-based program synthesis. In Proceedings of ICSE 2010. 215224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Kalyan Ashwin, Mohta Abhishek, Polozov Oleksandr, Batra Dhruv, Jain Prateek, and Gulwani Sumit. 2018. Neural-guided deductive search for real-time program synthesis from examples. In Proceedings of ICLR 2018.Google ScholarGoogle Scholar
  20. [20] Kauers Manuel and Paule Peter. 2011. The Concrete Tetrahedron—Symbolic Sums, Recurrence Equations, Generating Functions, Asymptotic Estimates. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Kauers Manuel and Zimmermann Burkhard. 2008. Computing the algebraic relations of C-finite sequences and multisequences. J. Symb. Comput. 43, 11 (2008), 787803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Kincaid Zachary, Cyphert John, Breck Jason, and Reps Thomas W.. 2018. Non-linear reasoning for invariant synthesis. Proc. ACM Program. Lang. 2, POPL (2018), Article 54, 33 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Knuth Donald Ervin. 1997. The Art of Computer Programming. Addison-Wesley.Google ScholarGoogle Scholar
  24. [24] Kovács Laura and Voronkov Andrei. 2013. First-order theorem proving and vampire. In Computer Aided Verification. Lecture Notes in Computer Science, Vol. 8044. Springer, 1–35. Google ScholarGoogle ScholarCross RefCross Ref
  25. [25] Kuncak Viktor, Mayer Mikaël, Piskac Ruzica, and Suter Philippe. 2012. Software synthesis procedures. Commun. ACM 55, 2 (2012), 103111. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Leino K. Rustan and M.. 2017. Accessible software verification with dafny. IEEE Softw. 34, 6 (2017), 9497. Google ScholarGoogle ScholarCross RefCross Ref
  27. [27] Manna Zohar and Waldinger Richard J.. 1980. A deductive approach to program synthesis. ACM Trans. Program. Lang. Syst. 2, 1 (1980), 90121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Narodytska Nina and Bacchus Fahiem. 2014. Maximum satisfiability using core-guided MaxSAT resolution. In Proceedings of AAAI 2014. 27172723.Google ScholarGoogle Scholar
  29. [29] Nye Maxwell, Hewitt Luke, Tenenbaum Joshua, and Solar-Lezama Armando. 2019. Learning to infer program sketches. In Proceedings of ICML 2019. 48614870. http://proceedings.mlr.press/v97/nye19a.html.Google ScholarGoogle Scholar
  30. [30] Poli Riccardo, Langdon William B., and McPhee Nicholas Freitag. 2008. A Field Guide to Genetic Programming. Lulu Enterprises, UK Ltd. http://www.gp-field-guide.org.uk/.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. [31] Rodríguez-Carbonell Enric and Kapur Deepak. 2007. Generating all polynomial invariants in simple loops. J. Symb. Comput. 42, 4 (2007), 443476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. [32] Solar-Lezama Armando. 2009. The sketching approach to program synthesis. In Proceedings of APLAS 2009. 413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. [33] Srivastava Saurabh, Gulwani Sumit, and Foster Jeffrey S.. 2010. From program verification to program synthesis. In Proceedings of POPL 2010. 313326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] Put Marius Van der and Singer Michael F.. 1997. Galois Theory of Difference Equations. Springer. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Algebra-Based Reasoning for Loop Synthesis

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image Formal Aspects of Computing
                Formal Aspects of Computing  Volume 34, Issue 1
                March 2022
                157 pages
                ISSN:0934-5043
                EISSN:1433-299X
                DOI:10.1145/3543541
                Issue’s Table of Contents

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 21 July 2022
                • Online AM: 7 April 2022
                • Revised: 1 March 2022
                • Accepted: 1 March 2022
                • Received: 1 June 2021
                Published in fac Volume 34, Issue 1

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article
                • Refereed

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader