1 Introduction

Collision resistance and second-preimage resistance are fundamental properties of hash functions, and are the basis of security for hash-based signature schemes [4, 7, 10, 11], which are a promising approach for post-quantum security.

We give a new way to reason about and characterize the collision resistance and second-preimage resistance of a large, natural class of programs, in the random oracle model. Specifically, we characterize these properties for the class of Linicrypt programs, introduced by Carmer and Rosulek [5]. Roughly speaking, a Linicrypt program is one where all intermediate values are field elements, and the only operations possible are fixed linear combinations, sampling uniformly from the field, and calling a random oracle (whose outputs are field elements). Many of the most practical cryptographic constructions are captured by this model: hash-based signatures and block cipher modes, to name a few.

Carmer and Rosulek showed that such programs admit an algebraic representations that is amenable to reasoning about programs’ cryptographic properties. Specifically, they showed a polynomial-time algorithm for deciding whether two Linicrypt programs induce computationally indistinguishable distributions. They also demonstrated the feasibility of using a SAT solver to automatically synthesize Linicrypt programs that satisfy given correctness & security constraints, by successfully synthesizing secure Linicrypt constructions of garbled circuits.

Our work follows a similar path, showing that collision properties can also be characterized cleanly in terms of the algebraic representation for Linicrypt programs. Our characterization holds for programs in which distinct oracle queries have the form \(H(t_1;\cdot ), H(t_2; \cdot ), \ldots \) for distinct nonces \(t_i\).

We introduce an algebraic property of Linicrypt programs called a collision structure, which completely characterizes both second-preimage resistance and collision resistance. The presence of a collision structure in a program \(\mathcal {P} \) can be detected in polynomial time (in the size of \(\mathcal {P} \)’s algebraic representation).

Theorem 1

(Main Theorem). Let \(\mathcal {P}\) be a deterministic Linicrypt program with distinct nonces, making n oracle queries. Let \(\mathbb {F} \) be the underlying field (and range of the random oracle). Then the following are equivalent:

  1. 1.

    There is an adversary \(\mathcal {A} \) making q oracle queries that finds collisions with probability more than \((q/n)^{2n}/|\mathbb {F} |\).

  2. 2.

    There is an adversary \(\mathcal {A} \) making q oracle queries that finds second preimages with probability more than \((q/n)^{n}/|\mathbb {F} |\).

  3. 3.

    There is an adversary \(\mathcal {A} \) making at most 2n oracle queries that finds second preimages with probability 1.

  4. 4.

    \(\mathcal {P} \) either has a collision structure or is degenerate. (See main text for definitions)

We emphasize that the theorem statement refers to standard security properties (i.e., security against arbitrary, computationally unbounded algorithms that make only a polynomial number of queries to the random oracle) of Linicrypt constructions. We are not in a heuristic model that considers Linicrypt adversaries.

Our results show that second-preimage resistance and collision resistance are equivalent, in an asymptotic sense (i.e., considering only whether a quantity is negligible or not). However, as might be expected, it is quadratically easier to find collisions than second preimages, due to birthday attacks. Our concrete bounds reflect this. In practice, reducing security to second-preimage resistance rather than collision resistance can result in constructions with 50% smaller parameters; e.g., [2, 6, 8].

1.1 Related Work and Comparison

Bellare and Micciancio [1] discuss the collision resistance of the function \(H^*( x_1, \ldots , x_n ) = H(1; x_1) \oplus \cdots \oplus H(n; x_n)\), where H is collision-resistant. Indeed, this function is naturally modeled in Linicrypt over a field \(GF(2^\lambda )\). They show that this function fails to be collision-resistant if n is allowed to vary with the input (in particular, when \(n \ge \lambda +1\)). Our characterization shows that an adversary making q oracle queries breaks collision resistance with probability bounded by \((q/n)^{2n}/2^{\lambda }\) since the function lacks a “collision structure.” These two results are not in conflict, since our bound is meaningless when \(n \ge \lambda +1\). In short, the Linicrypt model is best suited for programs whose only dependence on the security parameter is the choice of field, but where (in particular) the number of inputs and calls to H are fixed constants.

Another related work is that of Wagner [13], who gives an algorithm for a generalized birthday problem. The problem (translated to our notation) is to find \(x_1, \ldots , x_k\) such that \(H(x_1) \oplus \cdots \oplus H(x_k) = 0\). The case of \(k=2\) corresponds to the well-known birthday problem. One can see that by generating a list \(L_i\) of roughly \(2^{\lambda /k}\) candidates for each \(x_i\) (i.e., so \(|L_1 \times \cdots \times L_k| \ge 2^\lambda \)), there is likely to exist some solution to the problem. Wagner’s focus is on the algorithmic aspect of actually identifying the appropriate candidates. In Linicrypt, all adversaries are considered to be computationally unbounded but bounded in the number of queries to the random oracle H. As such, our results do not provide any upper/lower bounds on attack complexity (other than in random oracle query complexity).

Black, Rogaway and Shrimpton [3] categorize 64 ways to construct a compression function (suitable for Merkle-Damgård hashing) from an ideal cipher, building on prior work by Preneel, Govaerts and Vandewalle [12]. These constructions can be thought of as \(GF(2^\lambda )\)-Linicrypt programs that use only XOR (e.g., linear combinations with coefficients of 0 or 1 only). However, the reasoning is tied to the ideal cipher model rather than the random oracle model, as in Linicrypt (see Sect. B.3 for more information). We leave it as interesting future work to extend results in Linicrypt to the ideal cipher model, and potentially re-derive the characterization of BRS from a linear-algebraic perspective.

2 Preliminaries

We write scalar field elements as lowercase non-bold letters (e.g., \(v \in \mathbb {F} \)). We write vectors as lowercase bold letters (e.g., \(\varvec{q}\in \mathbb {F} ^n\)). We write matrices as uppercase bold letters (e.g., \(\varvec{M} \in \mathbb {F} ^{n \times m}\)). We write vector inner product as \(\varvec{q}\cdot \varvec{v}\), and matrix-vector multiplication as \(\varvec{M} \times \varvec{v}\) or \(\varvec{M} \varvec{v}\).

2.1 Linicrypt

The Linicrypt model was introduced in [5]. We present a brief summary of the model and its important properties.

A Linicrypt program (over field \(\mathbb {F} \)) is one in which every intermediate value is an element of \(\mathbb {F} \), and the program is a fixed, straight-line sequence of the following kinds of operations:

  • Call a random oracle (whose inputs/outputs are field elements).

  • Sample a random field element.

  • Combinine existing values using a fixed linear combination.

The sequence of operations (including choice of arguments to the oracle, coefficients of linear combinations, etc.) is entirely fixed. In particular, these cannot depend on intermediate values in the computation.

The only source of cryptographic power in Linicrypt is the random oracle, whose outputs are \(\mathbb {F} \)-elements. We therefore require the size of the field \(|\mathbb {F} |\) to be exponential in the security parameter \(\lambda \). Since the field depends on the security parameter, we sometimes write \(\mathbb {F} = \mathbb {F} _\lambda \) to make the association explicit.

If the field depends on the security parameter, then the program does too (since it is parameterized by specific coefficients of linear combinations). One can either consider a Linicrypt program to be a non-uniform family of programs (one for each choice of field/security parameter), or one can fix all coefficients in the program from \(\widetilde{\mathbb {F}}\) which is a subfield of every \(\mathbb {F} _\lambda \) (for example, a program that uses only \(\{0,1\}\) coefficients can be instantiated over any field \(GF(2^\lambda )\)). Our treatment of security is concrete (not asymptotic), so these distinctions are not important in this work.

We can reason about Linicrypt programs in the following algebraic way. Let \(\mathcal {P} \) be such a program, and let \(v_1, \ldots , v_n\) denote all of its intermediate variables. Say the first k of them are \(\mathcal {P} \)’s input and the last l of them are \(\mathcal {P} \)’s output. We say that \(v_i\) is a base variable if \(v_i\) is either an input variable, the result of a call to the oracle, or the result of sampling a field element. All variables can therefore be expressed as a fixed linear combination of base variables.

Let \(\varvec{v}_{\textsf {base}}\) denote the vector of all base variables. For each variable \(v_i\), let \(\varvec{r}_i\) denote the vector such that \(v_i = \varvec{r}_i \cdot \varvec{v}_{\textsf {base}}\). For example, for base variables, \(\varvec{r}_i\) is a canonical basis vector (0s everywhere except 1 in one component).

Suppose the output of \(\mathcal {P} \) consists of \(v_{n-l+1}, \ldots , v_n\). Then the output matrix of \(\mathcal {P} \) is defined as: \( \varvec{M} \overset{\text {def}}{=} \begin{bmatrix} \varvec{r}_{n-l+1} \\ \vdots \\ \varvec{r}_{n} \end{bmatrix} \). This matrix captures the fact that \(\mathcal {P} \)’s output can be expressed as \(\varvec{M} \times \varvec{v}_{\textsf {base}}\).

Each oracle query in \(\mathcal {P} \) is of the form “\(v_i := H(t; v_{i_1}, \ldots , v_{i_m})\),” where t is a string (e.g., nonce) and \(i_1, \ldots , i_m < i\) are indices, all fixed as part of \(\mathcal {P} \). For each such query we define an associated oracle constraint \( c = \left( t, \begin{bmatrix} \varvec{r}_{i_1} \\ \vdots \\ \varvec{r}_{i_m} \end{bmatrix}, \varvec{r}_i \right) \). In other words, an oracle constraint \((t, \varvec{Q},\varvec{a})\) captures the fact that if the oracle is queried as \(H(t; \varvec{Q}\times \varvec{v}_{\textsf {base}})\), then the response is \(\varvec{a}\cdot \varvec{v}_{\textsf {base}}\). When t is the empty string, we often omit it from our notation and simply write \(H(\cdot )\) instead of \(H(\epsilon ; \cdot )\).

The algebraic representation of \(\mathcal {P} \) is \(\mathcal {P} = (\varvec{M},\mathcal {C})\), where \(\varvec{M} \) is the output matrix of \(\mathcal {P} \) and \(\mathcal {C} \) is the set of all oracle constraints. Indeed, these two pieces of information completely characterize the behavior of \(\mathcal {P} \) (as established in [5]).

Example. In this work we focus on deterministic Linicrypt programs. One such example is given below. Its base variables are \((v_1, \ldots , v_5, v_7)\).

figure a

Hence, the algebraic representation of \(\mathcal {P} \) is:

figure b

2.2 Security Definitions

The Linicrypt model is meant to capture a special class of construction, but not adversaries. In this work we characterize standard security definitions, against arbitrary (i.e., not necessarily Linicrypt) adversaries. As in Impagliazzo’s “Minicrypt” [9] we consider computationally unbounded adversaries that are bounded-query: they make only at most \(p(\lambda )\) queries to the random oracle, for some polynomial p.

Definition 2

Let \(\mathcal {P} \) be a Linicrypt program over a family of fields \(\mathbb {F} = (\mathbb {F} _\lambda )_\lambda \). Then \(\mathcal {P} \) is \((q,\epsilon )\)-collision-resistant (in the random oracle model) if for all q-query adversaries \(\mathcal {A} \), \(\Pr [\textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1] \le \epsilon \), where:

figure c

Definition 3

Let \(\mathcal {P} \) be as above (with k inputs). \(\mathcal {P} \) is \((q,\epsilon )\)-2nd-preimage-resistant (in the random oracle model) if for all q-query adversaries \(\mathcal {A} \), \(\Pr [\textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1]\le \epsilon \), where:

figure d

3 Characterizing Collision-Resistance in Linicrypt

We now present our main technical result, which is a characterization of collision-resistance for Linicrypt programs.

In order to simplify the notation, we present the results for the special case of Linicrypt programs that make 1-ary calls to H. That is, every call to H is of the form H(tv) for a single \(v \in \mathbb {F} \) (note that Linicrypt supports more general calls of the form \(H(t; v_1, \ldots , v_k)\)). With this simplification, every oracle constraint has the form \((t,\varvec{q},\varvec{a})\) where \(\varvec{q}\) is a simple vector (rather than a matrix as in the most general form).

This special case simplifies the notation required to express our theorems/proofs, but does not gloss over any meaningful complexity. Later in Sect. B.1 we discuss what minor changes are necessary to extend these results to the unrestricted general case.

3.1 Easy Case: Degeneracy

Some Linicrypt programs allow easy collisions. Consider the program \(\mathcal {P} ^H(x,y) = H(x+y)\). An obvious collision in \(\mathcal {P} \) is \(\mathcal {P} ^H(x,y) = \mathcal {P} ^H(x+z,y-z)\) for any \(z \ne 0\). What makes this program particularly easy to attack is that not only do the two computations give the same output, but they query H on exactly the same points. In other words, the input of \(\mathcal {P} \) is not uniquely determined by its sequence of oracle queries along with its outputs.

Definition 4

Let \(\mathcal {P} =(\varvec{M},\mathcal {C})\) be a Linicrypt program with k inputs. In the algebraic representation, \(\mathcal {P} \)’s inputs are associated with canonical basis vectors \(\varvec{e}_1, \ldots , \varvec{e}_k\) (\(\varvec{e}_i\) has 0s everywhere except a 1 in the ith component). We say that \(\mathcal {P} \) is degenerate if

$$ \textsf {span}( \varvec{e}_1, \ldots , \varvec{e}_k ) \not \subseteq \textsf {span}\Big ( \{ \varvec{q}\mid (t,\varvec{q},\varvec{a}) \in \mathcal {C} \} \cup \textsf {rows}(\varvec{M}) \Big ) $$

Lemma 5

If \(\mathcal {P} \) is degenerate, then second preimages can be found with probability 1.

Proof

Given an input \(\varvec{x}\) for \(\mathcal {P} \) in the second preimage game, compute the base variables \(\varvec{v}\) in the computation of \(\mathcal {P} ^H(\varvec{x})\). If \(\mathcal {P} \) is degenerate, there must exist two (actually, at least \(|\mathbb {F} _\lambda |\)) solutions for the input \(\varvec{x}'\) that are consistent with \(\{ \varvec{q}\cdot \varvec{v}\mid (t,\varvec{q},\varvec{a}) \in \mathcal {C} \} \cup \{ \varvec{r}\cdot \varvec{v}\mid \varvec{r}\in \textsf {rows}(\varvec{M}) \}\). Such an \(\varvec{x}'\) will clearly lead \(\mathcal {P} ^H\) to make the same oracle queries and give the same output.

3.2 Running Example: An Interesting Second-Preimage Attack

Consider the example program below. In fact, it is the example from Sect. 2.1 but with the nonces omitted and most intermediate variables unnamed:

figure e

Suppose we are given xyz and are asked to find a second preimage \(x',y',z'\) with \(\mathcal {P} ^H(x,y,z) = \mathcal {P} ^H(x',y',z')\). Here is how to do it:

  1. 1.

    The second component of \(\mathcal {P} \)’s output is H(z). Since we cannot hope to find a second preimage directly in H, we must set \(z' = z\).

  2. 2.

    The key insight is to now set \(w' \ne w\) arbitrarily (hence, why we gave this value a name). We make a promise to choose \(x',y'\) so that \(w' = H(x') + H(z') + y'\).

  3. 3.

    To have a collision, we must have \(H(w')+x' = H(w) + x\). Importantly, \(x'\) is the only unknown value in this expression, and it is possible to simply solve for \(x'\).

  4. 4.

    It is time to fulfill the promise that \(w' = H(x')+H(z')+y'\). Since \(w',x',z'\) are already fixed, we can solve for \(y'\).

Note that we are guaranteed that \((x,y,z) \ne (x',y',z')\) since the two computations of \(\mathcal {P} \) lead to different intermediate values \(w \ne w'\) (and \(\mathcal {P} \) is deterministic).

Perspective. This example is representative of how second preimages can be computed in arbitrary Linicrypt programs. Given an input \(\varvec{x}\) for \(\mathcal {P} ^H\), we compute a second preimage \(\varvec{x}'\) by focusing on the oracle queries that \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\) will make:

  1. 1.

    Designate some of the oracle queries to take the same values in both \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\). In our example, we decided that the oracle query H(z) would take the same values in both computations.

  2. 2.

    Identify the first query that we will assign different values in the two computations. Set the input to this query arbitrarily in \(\mathcal {P} ^H(\varvec{x}')\). In our example, we identify the H(w) query to take on different values and set \(w' \ne w\) arbitrarily.

  3. 3.

    Repeatedly make followup oracle queries as they become possible, while using linear algebra to solve for other intermediate values. In our example, we call \(H(w')\), which allows us to solve for \(x'\), which allows us to call \(H(x')\), which allows us to solve for \(y'\).

3.3 Collision Structures for Finding Second Preimages

We have given a rough outline of how (we claim) Linicrypt second preimages must be found. The next step is to formalize what is required of \(\mathcal {P} \) in terms of its algebraic representation.

In step 2 above, we identify a query whose input will be chosen arbitrarily. Suppose that query corresponds to constraint \((t,\varvec{q},\varvec{a})\). Since this is the first value that is fixed differently in \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\), we must have \(\varvec{q}\) linearly independent of the vectors that are already fixed by step 1. Otherwise it would not be possible to find two consistent values for this query.

In steps 2 and 3 above, we repeatedly query H, and we have written the attack outline to suggest we never get “stuck.” One way we could get stuck is to make some query \(H(x')\) for the first time, when we have already fixed (either directly or indirectly) what \(H(x')\) must be. If this is the case, then we cannot succeed with probability better than \(1/|\mathbb {F} _\lambda |\). To avoid this case, every query we make in steps 2 & 3 of the outline must correspond to a constraint \((t, \varvec{q},\varvec{a})\) where \(\varvec{a}\) is linearly independent of the values that have already been fixed.

The following definition formalizes these algebraic intuitions:

Definition 6

Let \(\mathcal {P} = (\varvec{M},\mathcal {C})\) be a Linicrypt program. A collision structure for \(\mathcal {P} \) is a tuple \((i^*; c_1, \ldots , c_n)\), where:

  1. 1.

    \(c_1, \ldots , c_n\) is an ordering of \(\mathcal {C} \), and we write \(c_i = (t_i, \varvec{q}_i, \varvec{a}_i)\).

  2. 2.

    \(\varvec{q}_{i^*} \not \in \textsf {span}\Big ( \{ \varvec{q}_1, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M}) \Big )\)

  3. 3.

    For \(j \ge i^*\): \(\varvec{a}_j \not \in \textsf {span}\Big ( \{ \varvec{q}_1, \ldots , \varvec{q}_{j} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{j-1} \} \cup \textsf {rows}(\varvec{M}) \Big )\)

Connecting to the previous intuition, a collision-finding attack will let oracle queries \(c_1, \ldots , c_{i^*-1}\) be the same in both executions \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\). Then \(c_{i^*}\) is the first oracle query that the attack fixes differently for the two executions. Property (2) of the definition ensures that it is possible to find 2 query values that are consistent with the previously fixed values. Property (3) captures the fact that from this point forward, no query should be forced to result in an output value that has already been fixed.

Running Example. We now revisit the running example from before, to illustrate a collision structure for it. The base variables of this program are x, y, z, H(x), H(z), H(w). Below is the algebraic representation of this program, with the oracle constraints arranged to show a collision structure (we do not write the empty nonces of the oracle constraints):

figure f

This ordering of queries is indeed a collision structure since:

  • \(\varvec{q}_2\) is linearly independent of all vectors above it in this diagram.

  • \(\varvec{a}_2\) is linearly independent of all vectors above it in this diagram.

  • \(\varvec{a}_3\) is linearly independent of all vectors above it in this diagram.

Second-Preimage-Finding Algorithm. In Fig. 1 we give an algorithm that finds second preimages by following the intuitive strategy above, from a given collision structure.

Fig. 1.
figure 1

Method for computing second preimages

Lemma 7

If a collision structure \((i^*; c_1, \ldots , c_n)\) exists for \(\mathcal {P} \), and \(\mathcal {P} \) is not degenerate, then the second-preimage resistance of \(\mathcal {P} \) is comprehensively broken. Specifically, let \(\mathcal {A} \) refer to \(\textsf {FindSecondPreimage}(\mathcal {P}, (i^*;c_1, \ldots , c_n), \cdot )\). Then:

$$ \Pr \Big [ \textsf {2PIGame}(\mathcal {P}, \mathcal {A}, \lambda ) = 1 \Big ] = 1 $$

Proof

Given \(\varvec{x}\), the goal is to compute a second preimage \(\varvec{x}'\). The computation of \(\mathcal {P} ^H(\varvec{x}')\) has a certain set of base variables \(\varvec{v}'\), and it suffices to compute those instead since \(\varvec{x}' = (\varvec{e}_1\cdot \varvec{v}', \ldots , \varvec{e}_k \cdot \varvec{v}')\). The attack \(\textsf {FindSecondPreimage}\) fixes one linear constraint of \(\varvec{v}'\) at a time, until \(\varvec{v}'\) is completely determined.

It suffices to show the following about the behavior of \(\textsf {FindSecondPreimage}\):

  1. 1.

    It computes a different set of base variables \(\varvec{v}'\) than those of \(\mathcal {P} ^H(\varvec{x})\).

  2. 2.

    It never adds incompatible (unsatisfiable) linear constraints on \(\varvec{v}'\).

  3. 3.

    Values \(\varvec{v}'\) are consistent with H. Namely, if \((t,\varvec{q},\varvec{a}) \in \mathcal {C} \), then \(H(t; \varvec{q}\cdot \varvec{v}') = \varvec{a}\cdot \varvec{v}'\).

  4. 4.

    By the end of the computation, enough constraints have been added to completely determine \(\varvec{v}'\).

Property 1 holds since \(\varvec{q}_{i^*}\cdot \varvec{v}\ne \varvec{q}_{i^*} \cdot \varvec{v}'\) by design. Regarding property 2:

  • The constraints on \(\varvec{v}'\) that are added for \(\varvec{M} \) and in the first for-loop are self-consistent—by construction they already have a valid solution in \(\varvec{v}\).

  • The constraint involving \(\varvec{q}_{i^*}\) is compatible with the previous constraints since \(\varvec{q}_{i^*}\) is linearly independent of the previous constraint vectors \(\{ \varvec{q}_1, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M})\), by the collision structure property.

  • Similarly, a constraint involving \(\varvec{q}_i\) for \(i \ge i^*\) (if-statement within last for-loop) is only added in the case that \(\varvec{q}_i\) is linearly independent of the previous constraint vectors.

  • The constraint involving \(\varvec{a}_i\) in the second for-loop is consistent since \(\varvec{a}_i\) is linearly independent of existing constraint vectors, again by the collision structure property.

Regarding property 3: for oracle constraints \(c_i\) with \(i < i^*\), consistency with H is ensured by agreeing with the existing values \(\varvec{v}\). For constraints \(c_i\) with \(i \ge i^*\), consistency is guaranteed since the second for-loop actually calls H to determine the consistent way to constrain \(\varvec{a}_i \cdot \varvec{v}'\).

Property 4 follows from the fact that \(\mathcal {P} \) is not degenerate. We can see that \(\varvec{M} \times \varvec{v}'\) and \(\varvec{q}\cdot \varvec{v}'\) are fixed/determined by the end of the computation, for all \((t,\varvec{q},\varvec{a}) \in \mathcal {C} \). Non-degeneracy implies that the input of \(\mathcal {P} \) (and hence all base variables) is uniquely determined.

Fig. 2.
figure 2

Method for finding collision structures in a Linicrypt program.

3.4 Efficiently Finding Collision Structures

In this section we show that it is possible to efficiently determine whether a Linicrypt program has a collision structure, by analyzing its algebraic representation. The algorithm for finding a collision structure is given in Fig. 2.

Lemma 8

\(\textsf {FindColStruct}(\mathcal {P})\) (Fig. 2) outputs a collision structure for \(\mathcal {P} \) if and only if one exists. Furthermore, the running time of \(\textsf {FindColStruct}\) is polynomial (in the size of \(\mathcal {P} \)’s algebraic representation).

In the interest of space, the proof is deferred to Appendix A.

3.5 Breaking Collision Resistance Implies Collision Structure

So far our discussion has centered around the relationship between collision structures and second-preimage resistance. We now show that if \(\mathcal {P} \) fails to be even collision resistant (in the random oracle model), then it has a collision structure. The main approach is to observe the oracle queries made by an arbitrary attacker (who computes a collision), and “extract” a collision structure from these queries.

The results in this subsection hold only for the following subclass of Linicrypt programs. In Sect. B.2 we discuss specifically why the results are restricted to this subclass.

Definition 9

Let \(\mathcal {P} = (\varvec{M},\mathcal {C})\) be a Linicrypt program, with \(\mathcal {C} = \{ (t_1, \varvec{q}_1, \varvec{a}_1), \ldots , (t_n, \varvec{q}_n, \varvec{a}_n)\}\). If all of \(\{t_1, \ldots , t_n\}\) are distinct then we say that \(\mathcal {P} \) has distinct nonces.

Lemma 10

Let \(\mathcal {P} \) be a deterministic Linicrypt program with distinct nonces that makes n oracle queries. Let \(\mathcal {A} \) be an oracle program that makes at most N oracle queries. If

$$\begin{aligned} \Pr [ \textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1]&> \left( \frac{N}{n}\right) ^{2n}/ |\mathbb {F} _\lambda | \\ \text{ or } \text{ if } \Pr [ \textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1]&> \left( \frac{N}{n}\right) ^{n}/ |\mathbb {F} _\lambda | \end{aligned}$$

then \(\mathcal {P} \) either has a collision structure or is degenerate.

Proof

Without loss of generality, we can assume the following about \(\mathcal {A} \):

  • Let \((\varvec{x},\varvec{x}')\) be the two preimages from the games (in 2PIGame \(\mathcal {A} \) gets \(\varvec{x}\) as input and gives \(\varvec{x}'\) as output; in ColGame \(\mathcal {A} \) outputs both \(\varvec{x}\) and \(\varvec{x}'\)). We assume that \(\mathcal {A} ^H\) has made the oracle queries that \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\) will make. In ColGame this can be achieved by modifying \(\mathcal {A} \) to run these two computations as its last action. In 2PIGame this can be achieved by having \(\mathcal {A} \) run \(\mathcal {P} ^H(\varvec{x})\) as its first action and \(\mathcal {P} ^H(\varvec{x}')\) as its last action.

  • \(\mathcal {A} \) never repeats a query to H. This can be achieved by simple memoization. Note that when \(\mathcal {A} \) runs, say, \(\mathcal {P} ^H(\varvec{x}')\) as its last action, some of those oracle queries may have been made previously.

  • \(\mathcal {A} ^H\) can actually output \((\varvec{v},\varvec{v}')\), where \(\varvec{v}\) is the set of base variables in the computation of \(\mathcal {P} ^H(\varvec{x})\), and \(\varvec{v}'\) the base variables in \(\mathcal {P} ^H(\varvec{x}')\). This is because the base variables are computed during the process of running \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\).

Note that the base variables have the following property. Let \(c = (t,\varvec{q},\varvec{a})\) be one of the oracle constraints of \(\mathcal {P} \). Then the computation \(\mathcal {P} ^H(\varvec{x})\) (and hence \(\mathcal {A} ^H\) as well) at some point makes an oracle query \(H(t,\varvec{q}\cdot \varvec{v})\) and gets a response \(\varvec{a}\cdot \varvec{v}\).

From these assumptions, whenever \(\mathcal {A} \) outputs a successful collision there exist well-defined mappings \(T,T' : \mathcal {C} \rightarrow \mathbb {N}\) such that:

  • For every constraint \(c = (t,\varvec{q},\varvec{a}) \in \mathcal {C} \), the T(c)th query made by \(\mathcal {A} ^H\) is the one corresponding to oracle constraint c in the computation of \(\mathcal {P} ^H(\varvec{x})\). In other words, it is the query in which \(\mathcal {A} ^H\) “decided” what \(\varvec{q}\cdot \varvec{v}\) should be (and learned what \(\varvec{a}\cdot \varvec{v}\) was as a result of the query).

  • Similarly, the \(T'(c)\)th query made by \(\mathcal {A} ^H\) is the one corresponding to oracle constraint c in the computation of \(\mathcal {P} ^H(\varvec{x}')\). This is the query in which \(\varvec{q}\cdot \varvec{v}'\) was determined.

How many possible mappings \((T,T')\) are there if \(\mathcal {A} \) makes N oracle queries? Let \(N_i\) be the number of oracle queries that \(\mathcal {A} \) makes which have nonce \(t_i\). Since the nonces are distinct, we have \(\sum _i N_i \le N\). There are only \(N_i\) choices for how T or \(T'\) can map \(T(c_i)\). Hence there are at most \( \prod _{i=1}^n N_i^2\) possible \((T,T')\) mappings. However, in the 2PIGame, the mapping T is completely fixed since we assume \(\mathcal {A} \) performs the computation \(\mathcal {P} ^H(\varvec{x})\) as its first action. In that case, there are only \(\prod _{i=1}^n N_i\) choices of the mapping \(T'\). These products are maximized when each \(N_i = N/n\), so we get an upper bound of \((N/n)^{2n}\) possible \((T,T')\) mappings in the ColGame and \((N/n)^n\) mappings in the 2PIGame.

Applying the pigeonhole principle and uniting both cases from the statement of the lemma (collision game and second preimage game), there is a specific \((T,T')\) such that:

$$ \Pr [\mathcal {A} ^H \text{ outputs } \text{ a } \text{ valid } \text{ collision } \text{ while } \text{ using } \text{ mappings } (T,T')] > 1/|\mathbb {F} _\lambda | $$

For the rest of the proof, we condition on the event that \(\mathcal {A} \) computes a collision while using this specific mapping \((T,T')\). This is without loss of generality by making \(\mathcal {A} \), as its final action, output \(\bot \) if it observes that some different mapping is used. Hence we can view the association between oracle calls of \(\mathcal {P} \) and \(\mathcal {A} \) as fixed a priori. That is, we can know in advance that a particular oracle call of \(\mathcal {A} \) will determine the value of \(\varvec{q}\cdot \varvec{v}\) (or \(\varvec{q}\cdot \varvec{v}'\)) for a specific \(\varvec{q}\).

For some \(c \in \mathcal {C} \), if \(T(c) = T'(c)\), then we call c convergent. In this case, \(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\) make the same c-query and receive the same output. In other words, under such a mapping \(T,T'\), adversary \(\mathcal {A} ^H\) will choose that \(\varvec{q}\cdot \varvec{v}= \varvec{q}\cdot \varvec{v}'\). If \(T(c) \ne T'(c)\), we call c divergent\(\mathcal {P} ^H(\varvec{x})\) and \(\mathcal {P} ^H(\varvec{x}')\) make different c-queries, i.e., \(\varvec{q}\cdot \varvec{v}\ne \varvec{q}\cdot \varvec{v}'\).

If all \(c \in \mathcal {C} \) are convergent, then two distinct inputs \(\varvec{x}\) and \(\varvec{x}'\) cause \(\mathcal {P} \) to make identical oracle queries and give identical output. Hence \(\mathcal {P} \) is degenerate, and we are done. We continue assuming that some query is divergent, and will conclude that \(\mathcal {P} \) has a collision structure.

Define \(\textsf {finish}(c) = \max \{ T(c), T'(c) \}\). Note that since \(\mathcal {P} \) has distinct nonces, an oracle query made by \(\mathcal {A} \) cannot be associated with more than one \(c \in \mathcal {C} \). Hence \(\textsf {finish}\) is an injective function.

We obtain a collision structure for \(\mathcal {P} \) as follows. Order the oracle constraints in \(\mathcal {C} \) as \((c_1, \ldots , c_n)\), where all of the convergent queries come first, followed by the divergent queries ordered by increasing finish time. Let \(i^*\) be the index of the divergent query with earliest finish time. Then:

  • \(i^* \le i\) \(\Leftrightarrow \) \(c_i\) is divergent

  • \(i^* \le i < j\) \(\Leftrightarrow \) \(\textsf {finish}(i) < \textsf {finish}(j)\)

Claim

\(( i^* ; c_1, \ldots , c_n)\) is a collision structure for \(\mathcal {P} \).

In the following, we write each oracle constraint \(c_i\) as \(c_i = (t_i, \varvec{q}_i, \varvec{a}_i)\).

For \(j < i^*\), the query \(c_j\) is convergent so we have \(\varvec{q}_{j} \cdot \varvec{v}= \varvec{q}_{j} \cdot \varvec{v}'\) and \(\varvec{a}_{j} \cdot \varvec{v}= \varvec{a}_{j} \cdot \varvec{v}'\). Since the outputs of the two executions of \(\mathcal {P} \) are also identical, we also have \(\varvec{M} \varvec{v}= \varvec{M} \varvec{v}'\). Since \(c_{i^*}\) is divergent, we have \(\varvec{q}_{i^*} \cdot \varvec{v}\ne \varvec{q}_{i^*} \cdot \varvec{v}'\). From this we conclude that:

$$\varvec{q}_{i^*} \not \in \textsf {span}\Big ( \{ \varvec{q}_{1}, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_{1}, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M}) \Big ).$$

This is the first property required of a collision structure.

It remains to show that for all \(i > i^*\),

$$ \varvec{a}_{i} \not \in \textsf {span}\Big ( \{ \varvec{q}_{1}, \ldots , \varvec{q}_{i} \} \cup \{ \varvec{a}_{1}, \ldots , \varvec{a}_{i-1} \} \cup \textsf {rows}(\varvec{M}) \Big ). $$

Suppose for contradiction that the above is false, and that we actually have:

$$ \varvec{a}_i = \sum _{j\le i} \alpha _j \varvec{q}_j + \sum _{j < i} \beta _j \varvec{a}_j + \varvec{\gamma } \varvec{M} $$

Focus on the moment when \(\mathcal {A} \) has asked its \(\textsf {finish}(c_i)\)th query and is awaiting the response from H. By symmetry, suppose \(\textsf {finish}(c_i) = T'(c_i)\), so that this query is on \(\varvec{q}_i \cdot \varvec{v}'\); the result of the query will be assigned to \(\varvec{a}_i \cdot \varvec{v}'\). At this moment:

  • All queries \(c_j\) for \(i^* \le j < i\) are finished. This means that the oracle queries of \(\mathcal {A} ^H\) have already determined \(\varvec{q}_j \cdot \varvec{v}\), \(\varvec{a}_j \cdot \varvec{v}\), \(\varvec{q}_j \cdot \varvec{v}'\), and \(\varvec{a}_j \cdot \varvec{v}'\). Further, the queries (but not responses) of oracle constraint \(c_i\) have been fixed as well—these values are \(\varvec{q}_i \cdot \varvec{v}\) and \(\varvec{q}_i \cdot \varvec{v}'\).

  • \(\varvec{a}_i \cdot \varvec{v}\) has already been fixed, since this happened at time \(T(c_i) < T'(c_i)\). But \(\varvec{a}_i \cdot \varvec{v}'\) is about to be chosen as a uniform field element.

Now consider the expression \(\varvec{a}_i \cdot (\varvec{v}' - \varvec{v})\):

$$ \varvec{a}_i \cdot (\varvec{v}'-\varvec{v}) = \sum _{j\le i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j < i} \beta _j \varvec{a}_j\cdot (\varvec{v}'-\varvec{v}) + \varvec{\gamma } \varvec{M} (\varvec{v}'-\varvec{v}) $$

For \(j < i^*\) we know that query \(c_j\) is convergent. This implies that \(\varvec{q}_j \cdot (\varvec{v}' - \varvec{v})=0\) and \(\varvec{a}_j\cdot (\varvec{v}'-\varvec{v}) =0\). We also know that \(\varvec{M} (\varvec{v}'-\varvec{v}) = 0\), in the case that \(\mathcal {A} ^H\) is successful generating a collision. Cancelling these terms gives:

$$ \varvec{a}_i \cdot (\varvec{v}'-\varvec{v}) = \sum _{j= i^*}^{ i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j =i^*}^{i-1} \beta _j \varvec{a}_j \cdot (\varvec{v}'-\varvec{v}) $$

Isolating \(\varvec{a}_i \cdot \varvec{v}'\) gives:

$$ \varvec{a}_i \cdot \varvec{v}' = - \varvec{a}_i \cdot \varvec{v}+ \sum _{j= i^*}^{ i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j =i^*}^{i-1} \beta _j \varvec{a}_j \cdot (\varvec{v}'-\varvec{v}) $$

But all terms on the right-hand side have already been fixed, while the term on the left is chosen uniformly in \(\mathbb {F} \). So equality holds with probability \(1/|\mathbb {F} _\lambda |\). This contradicts the assumption that \(\mathcal {A} \) succeeds with strictly greater probability.

3.6 Putting Everything Together

Our main characterization shows that second-preimage resistance and collision resistance coincide for this class of Linicrypt programs, in a very strong sense:

Theorem 11

Let \(\mathcal {P}\) be a deterministic Linicrypt program with distinct nonces, making n oracle queries. Then the following are equivalent:

  1. 1.

    There is an adversary \(\mathcal {A} \) making N oracle queries such that

    $$\Pr [ \textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1] > \left( \frac{N}{n}\right) ^{2n}/ |\mathbb {F} _\lambda |.$$
  2. 2.

    There is an adversary \(\mathcal {A} \) making N oracle queries such that

    $$\Pr [ \textsf {2PI}(\mathcal {P},\mathcal {A},\lambda )=1] > \left( \frac{N}{n}\right) ^{n}/ |\mathbb {F} _\lambda |.$$
  3. 3.

    There is an adversary \(\mathcal {A} \) making at most 2n oracle queries such that

    $$\Pr [ \textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1] = 1.$$
  4. 4.

    \(\mathcal {P} \) either has a collision structure or is degenerate

Corollary 12

The collision resistance (equivalently, second-preimage resistance) of deterministic, distinct-nonce Linicrypt programs \(\mathcal {P} \) can be decided in polynomial time (in the size of \(\mathcal {P} \)’s algebraic representation).

Proof

Using standard linear algebraic operations (e.g., Gaussian elimination), one can check \(\mathcal {P} \) for degeneracy or for the existence of a collision structure in polynomial time.

4 A Simple Application

We can illustrate the use of our main theorem with a simple example application. Suppose we have access to a random oracle which is compressing by a factor of 2-to-1. In the Linicrypt notation, this would be an oracle that takes 2 field elements (and the oracle nonce) as input and produces one field element as output—\(H:\{0,1\}^* \times \mathbb {F} ^2 \rightarrow \mathbb {F} \). If we require a collision resistant function that compresses by k-to-1 (for some fixed k), the following natural Merkle-Damgård-style iterative hash comes to mind:

figure g

The algebraic representation of this program is:

figure h

We have numbered the oracle constraints so that constraint \((i, \varvec{Q}_i, \varvec{a}_i)\) corresponds to the statement “\(y_i := H(i; y_{i-1}, x_i)\)” in \(\mathcal {P} \).

To determine whether this program is collision-resistant, we execute the FindColStruct algorithm.Footnote 1 Initially all oracle constraints start in the set \(\textsf {LEFT}\), and \(\textsf {RIGHT}\) starts out empty. The first loop in FindColStruct moves oracle constraints from \(\textsf {LEFT}\) to \(\textsf {RIGHT}\) whenever their \(\varvec{a}_i\) value is linearly independent of all other vectors appearing in \(\textsf {LEFT}\) (the multiset of vectors is represented as the variable V in FindColStruct).

In this program, every \(\varvec{a}_i\) vector is zeroes everywhere except for a 1 corresponding to the “\(y_i\)” column. Also note that \(\varvec{a}_k\) is identical to \(\varvec{M} \), and \(\varvec{a}_i\) (for \(i<k\)) appears as the first row of \(\varvec{Q}_{i+1}\) (see the example with \(\varvec{a}_2\) and \(\varvec{Q}_3\) above). In other words, every \(\varvec{a}_i\) is always in the span of other vectors appearing in \(\textsf {LEFT}\), so no oracle constraint will ever be added to \(\textsf {RIGHT}\).

Hence, \(\textsf {FindColStruct}{}\) will terminate with \(\textsf {RIGHT}=\emptyset \) and return \(\bot \). From our main characterization, this proves that the function is collision-resistant.

5 Extensions, Limitations, Future Work

In Appendix B we discuss several extensions and limitations of our techniques:

  • How the results generalize to oracle calls that take several field elements as input (results as stated in previous sections consider a random oracle of the form \(H: \{0,1\}^* \times \mathbb {F} \rightarrow \mathbb {F} \)).

  • Why the restriction to distinct nonces is significant, and how repeated nonces make the picture more complicated.

  • Extending the work to support the ideal cipher model instead of the random oracle model.