Characterizing Collision and Second-Preimage Resistance in Linicrypt

McQuoid, Ian; Swope, Trevor; Rosulek, Mike

doi:10.1007/978-3-030-36030-6_18

Ian McQuoid¹⁰,
Trevor Swope¹⁰ &
Mike Rosulek¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11891))

Included in the following conference series:

Theory of Cryptography Conference

991 Accesses
6 Citations

Abstract

Linicrypt (Carmer & Rosulek, Crypto 2016) refers to the class of algorithms that make calls to a random oracle and otherwise manipulate values via fixed linear operations. We give a characterization of collision-resistance and second-preimage resistance for a significant class of Linicrypt programs (specifically, those that achieve domain separation on their random oracle queries via nonces). Our characterization implies that collision-resistance and second-preimage resistance are equivalent, in an asymptotic sense, for this class. Furthermore, there is a polynomial-time procedure for determining whether such a Linicrypt program is collision/second-preimage resistant.

Authors partially supported by NSF award #1617197.

You have full access to this open access chapter, Download conference paper PDF

Linicrypt in the Ideal Cipher Model

From Indifferentiability to Constructive Cryptography (and Back)

Tight Security of Cascaded LRW2

Article 05 March 2020

1 Introduction

Collision resistance and second-preimage resistance are fundamental properties of hash functions, and are the basis of security for hash-based signature schemes [4, 7, 10, 11], which are a promising approach for post-quantum security.

We give a new way to reason about and characterize the collision resistance and second-preimage resistance of a large, natural class of programs, in the random oracle model. Specifically, we characterize these properties for the class of Linicrypt programs, introduced by Carmer and Rosulek [5]. Roughly speaking, a Linicrypt program is one where all intermediate values are field elements, and the only operations possible are fixed linear combinations, sampling uniformly from the field, and calling a random oracle (whose outputs are field elements). Many of the most practical cryptographic constructions are captured by this model: hash-based signatures and block cipher modes, to name a few.

Carmer and Rosulek showed that such programs admit an algebraic representations that is amenable to reasoning about programs’ cryptographic properties. Specifically, they showed a polynomial-time algorithm for deciding whether two Linicrypt programs induce computationally indistinguishable distributions. They also demonstrated the feasibility of using a SAT solver to automatically synthesize Linicrypt programs that satisfy given correctness & security constraints, by successfully synthesizing secure Linicrypt constructions of garbled circuits.

Our work follows a similar path, showing that collision properties can also be characterized cleanly in terms of the algebraic representation for Linicrypt programs. Our characterization holds for programs in which distinct oracle queries have the form $H(t_1;\cdot ), H(t_2; \cdot ), \ldots $ for distinct nonces $t_i$.

We introduce an algebraic property of Linicrypt programs called a collision structure, which completely characterizes both second-preimage resistance and collision resistance. The presence of a collision structure in a program $\mathcal {P} $ can be detected in polynomial time (in the size of $\mathcal {P} $’s algebraic representation).

Theorem 1

(Main Theorem). Let $\mathcal {P}$ be a deterministic Linicrypt program with distinct nonces, making n oracle queries. Let $\mathbb {F} $ be the underlying field (and range of the random oracle). Then the following are equivalent:

1.
There is an adversary $\mathcal {A} $ making q oracle queries that finds collisions with probability more than $(q/n)^{2n}/|\mathbb {F} |$.
2.
There is an adversary $\mathcal {A} $ making q oracle queries that finds second preimages with probability more than $(q/n)^{n}/|\mathbb {F} |$.
3.
There is an adversary $\mathcal {A} $ making at most 2n oracle queries that finds second preimages with probability 1.
4.
$\mathcal {P} $ either has a collision structure or is degenerate. (See main text for definitions)

We emphasize that the theorem statement refers to standard security properties (i.e., security against arbitrary, computationally unbounded algorithms that make only a polynomial number of queries to the random oracle) of Linicrypt constructions. We are not in a heuristic model that considers Linicrypt adversaries.

Our results show that second-preimage resistance and collision resistance are equivalent, in an asymptotic sense (i.e., considering only whether a quantity is negligible or not). However, as might be expected, it is quadratically easier to find collisions than second preimages, due to birthday attacks. Our concrete bounds reflect this. In practice, reducing security to second-preimage resistance rather than collision resistance can result in constructions with 50% smaller parameters; e.g., [2, 6, 8].

1.1 Related Work and Comparison

Bellare and Micciancio [1] discuss the collision resistance of the function $H^*( x_1, \ldots , x_n ) = H(1; x_1) \oplus \cdots \oplus H(n; x_n)$, where H is collision-resistant. Indeed, this function is naturally modeled in Linicrypt over a field $GF(2^\lambda )$. They show that this function fails to be collision-resistant if n is allowed to vary with the input (in particular, when $n \ge \lambda +1$). Our characterization shows that an adversary making q oracle queries breaks collision resistance with probability bounded by $(q/n)^{2n}/2^{\lambda }$ since the function lacks a “collision structure.” These two results are not in conflict, since our bound is meaningless when $n \ge \lambda +1$. In short, the Linicrypt model is best suited for programs whose only dependence on the security parameter is the choice of field, but where (in particular) the number of inputs and calls to H are fixed constants.

Another related work is that of Wagner [13], who gives an algorithm for a generalized birthday problem. The problem (translated to our notation) is to find $x_1, \ldots , x_k$ such that $H(x_1) \oplus \cdots \oplus H(x_k) = 0$. The case of $k=2$ corresponds to the well-known birthday problem. One can see that by generating a list $L_i$ of roughly $2^{\lambda /k}$ candidates for each $x_i$ (i.e., so $|L_1 \times \cdots \times L_k| \ge 2^\lambda $), there is likely to exist some solution to the problem. Wagner’s focus is on the algorithmic aspect of actually identifying the appropriate candidates. In Linicrypt, all adversaries are considered to be computationally unbounded but bounded in the number of queries to the random oracle H. As such, our results do not provide any upper/lower bounds on attack complexity (other than in random oracle query complexity).

Black, Rogaway and Shrimpton [3] categorize 64 ways to construct a compression function (suitable for Merkle-Damgård hashing) from an ideal cipher, building on prior work by Preneel, Govaerts and Vandewalle [12]. These constructions can be thought of as $GF(2^\lambda )$-Linicrypt programs that use only XOR (e.g., linear combinations with coefficients of 0 or 1 only). However, the reasoning is tied to the ideal cipher model rather than the random oracle model, as in Linicrypt (see Sect. B.3 for more information). We leave it as interesting future work to extend results in Linicrypt to the ideal cipher model, and potentially re-derive the characterization of BRS from a linear-algebraic perspective.

2 Preliminaries

We write scalar field elements as lowercase non-bold letters (e.g., $v \in \mathbb {F} $). We write vectors as lowercase bold letters (e.g., $\varvec{q}\in \mathbb {F} ^n$). We write matrices as uppercase bold letters (e.g., $\varvec{M} \in \mathbb {F} ^{n \times m}$). We write vector inner product as $\varvec{q}\cdot \varvec{v}$, and matrix-vector multiplication as $\varvec{M} \times \varvec{v}$ or $\varvec{M} \varvec{v}$.

2.1 Linicrypt

The Linicrypt model was introduced in [5]. We present a brief summary of the model and its important properties.

A Linicrypt program (over field $\mathbb {F} $) is one in which every intermediate value is an element of $\mathbb {F} $, and the program is a fixed, straight-line sequence of the following kinds of operations:

Call a random oracle (whose inputs/outputs are field elements).
Sample a random field element.
Combinine existing values using a fixed linear combination.

The sequence of operations (including choice of arguments to the oracle, coefficients of linear combinations, etc.) is entirely fixed. In particular, these cannot depend on intermediate values in the computation.

The only source of cryptographic power in Linicrypt is the random oracle, whose outputs are $\mathbb {F} $-elements. We therefore require the size of the field $|\mathbb {F} |$ to be exponential in the security parameter $\lambda $. Since the field depends on the security parameter, we sometimes write $\mathbb {F} = \mathbb {F} _\lambda $ to make the association explicit.

If the field depends on the security parameter, then the program does too (since it is parameterized by specific coefficients of linear combinations). One can either consider a Linicrypt program to be a non-uniform family of programs (one for each choice of field/security parameter), or one can fix all coefficients in the program from $\widetilde{\mathbb {F}}$ which is a subfield of every $\mathbb {F} _\lambda $ (for example, a program that uses only $\{0,1\}$ coefficients can be instantiated over any field $GF(2^\lambda )$). Our treatment of security is concrete (not asymptotic), so these distinctions are not important in this work.

We can reason about Linicrypt programs in the following algebraic way. Let $\mathcal {P} $ be such a program, and let $v_1, \ldots , v_n$ denote all of its intermediate variables. Say the first k of them are $\mathcal {P} $’s input and the last l of them are $\mathcal {P} $’s output. We say that $v_i$ is a base variable if $v_i$ is either an input variable, the result of a call to the oracle, or the result of sampling a field element. All variables can therefore be expressed as a fixed linear combination of base variables.

Let $\varvec{v}_{\textsf {base}}$ denote the vector of all base variables. For each variable $v_i$, let $\varvec{r}_i$ denote the vector such that $v_i = \varvec{r}_i \cdot \varvec{v}_{\textsf {base}}$. For example, for base variables, $\varvec{r}_i$ is a canonical basis vector (0s everywhere except 1 in one component).

Suppose the output of $\mathcal {P} $ consists of $v_{n-l+1}, \ldots , v_n$. Then the output matrix of $\mathcal {P} $ is defined as: $ \varvec{M} \overset{\text {def}}{=} \begin{bmatrix} \varvec{r}_{n-l+1} \\ \vdots \\ \varvec{r}_{n} \end{bmatrix} $. This matrix captures the fact that $\mathcal {P} $’s output can be expressed as $\varvec{M} \times \varvec{v}_{\textsf {base}}$.

Each oracle query in $\mathcal {P} $ is of the form “$v_i := H(t; v_{i_1}, \ldots , v_{i_m})$,” where t is a string (e.g., nonce) and $i_1, \ldots , i_m < i$ are indices, all fixed as part of $\mathcal {P} $. For each such query we define an associated oracle constraint $ c = \left( t, \begin{bmatrix} \varvec{r}_{i_1} \\ \vdots \\ \varvec{r}_{i_m} \end{bmatrix}, \varvec{r}_i \right) $. In other words, an oracle constraint $(t, \varvec{Q},\varvec{a})$ captures the fact that if the oracle is queried as $H(t; \varvec{Q}\times \varvec{v}_{\textsf {base}})$, then the response is $\varvec{a}\cdot \varvec{v}_{\textsf {base}}$. When t is the empty string, we often omit it from our notation and simply write $H(\cdot )$ instead of $H(\epsilon ; \cdot )$.

The algebraic representation of $\mathcal {P} $ is $\mathcal {P} = (\varvec{M},\mathcal {C})$, where $\varvec{M} $ is the output matrix of $\mathcal {P} $ and $\mathcal {C} $ is the set of all oracle constraints. Indeed, these two pieces of information completely characterize the behavior of $\mathcal {P} $ (as established in [5]).

Example. In this work we focus on deterministic Linicrypt programs. One such example is given below. Its base variables are $(v_1, \ldots , v_5, v_7)$.

Hence, the algebraic representation of $\mathcal {P} $ is:

2.2 Security Definitions

The Linicrypt model is meant to capture a special class of construction, but not adversaries. In this work we characterize standard security definitions, against arbitrary (i.e., not necessarily Linicrypt) adversaries. As in Impagliazzo’s “Minicrypt” [9] we consider computationally unbounded adversaries that are bounded-query: they make only at most $p(\lambda )$ queries to the random oracle, for some polynomial p.

Definition 2

Let $\mathcal {P} $ be a Linicrypt program over a family of fields $\mathbb {F} = (\mathbb {F} _\lambda )_\lambda $. Then $\mathcal {P} $ is $(q,\epsilon )$-collision-resistant (in the random oracle model) if for all q-query adversaries $\mathcal {A} $, $\Pr [\textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1] \le \epsilon $, where:

Definition 3

Let $\mathcal {P} $ be as above (with k inputs). $\mathcal {P} $ is $(q,\epsilon )$-2nd-preimage-resistant (in the random oracle model) if for all q-query adversaries $\mathcal {A} $, $\Pr [\textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1]\le \epsilon $, where:

3 Characterizing Collision-Resistance in Linicrypt

We now present our main technical result, which is a characterization of collision-resistance for Linicrypt programs.

In order to simplify the notation, we present the results for the special case of Linicrypt programs that make 1-ary calls to H. That is, every call to H is of the form H(t; v) for a single $v \in \mathbb {F} $ (note that Linicrypt supports more general calls of the form $H(t; v_1, \ldots , v_k)$). With this simplification, every oracle constraint has the form $(t,\varvec{q},\varvec{a})$ where $\varvec{q}$ is a simple vector (rather than a matrix as in the most general form).

This special case simplifies the notation required to express our theorems/proofs, but does not gloss over any meaningful complexity. Later in Sect. B.1 we discuss what minor changes are necessary to extend these results to the unrestricted general case.

3.1 Easy Case: Degeneracy

Some Linicrypt programs allow easy collisions. Consider the program $\mathcal {P} ^H(x,y) = H(x+y)$. An obvious collision in $\mathcal {P} $ is $\mathcal {P} ^H(x,y) = \mathcal {P} ^H(x+z,y-z)$ for any $z \ne 0$. What makes this program particularly easy to attack is that not only do the two computations give the same output, but they query H on exactly the same points. In other words, the input of $\mathcal {P} $ is not uniquely determined by its sequence of oracle queries along with its outputs.

Definition 4

Let $\mathcal {P} =(\varvec{M},\mathcal {C})$ be a Linicrypt program with k inputs. In the algebraic representation, $\mathcal {P} $’s inputs are associated with canonical basis vectors $\varvec{e}_1, \ldots , \varvec{e}_k$ ($\varvec{e}_i$ has 0s everywhere except a 1 in the ith component). We say that $\mathcal {P} $ is degenerate if

$$ \textsf {span}( \varvec{e}_1, \ldots , \varvec{e}_k ) \not \subseteq \textsf {span}\Big ( \{ \varvec{q}\mid (t,\varvec{q},\varvec{a}) \in \mathcal {C} \} \cup \textsf {rows}(\varvec{M}) \Big ) $$

Lemma 5

If $\mathcal {P} $ is degenerate, then second preimages can be found with probability 1.

Proof

Given an input $\varvec{x}$ for $\mathcal {P} $ in the second preimage game, compute the base variables $\varvec{v}$ in the computation of $\mathcal {P} ^H(\varvec{x})$. If $\mathcal {P} $ is degenerate, there must exist two (actually, at least $|\mathbb {F} _\lambda |$) solutions for the input $\varvec{x}'$ that are consistent with $\{ \varvec{q}\cdot \varvec{v}\mid (t,\varvec{q},\varvec{a}) \in \mathcal {C} \} \cup \{ \varvec{r}\cdot \varvec{v}\mid \varvec{r}\in \textsf {rows}(\varvec{M}) \}$. Such an $\varvec{x}'$ will clearly lead $\mathcal {P} ^H$ to make the same oracle queries and give the same output.

3.2 Running Example: An Interesting Second-Preimage Attack

Consider the example program below. In fact, it is the example from Sect. 2.1 but with the nonces omitted and most intermediate variables unnamed:

Suppose we are given x, y, z and are asked to find a second preimage $x',y',z'$ with $\mathcal {P} ^H(x,y,z) = \mathcal {P} ^H(x',y',z')$. Here is how to do it:

1.
The second component of $\mathcal {P} $’s output is H(z). Since we cannot hope to find a second preimage directly in H, we must set $z' = z$.
2.
The key insight is to now set $w' \ne w$ arbitrarily (hence, why we gave this value a name). We make a promise to choose $x',y'$ so that $w' = H(x') + H(z') + y'$.
3.
To have a collision, we must have $H(w')+x' = H(w) + x$. Importantly, $x'$ is the only unknown value in this expression, and it is possible to simply solve for $x'$.
4.
It is time to fulfill the promise that $w' = H(x')+H(z')+y'$. Since $w',x',z'$ are already fixed, we can solve for $y'$.

Note that we are guaranteed that $(x,y,z) \ne (x',y',z')$ since the two computations of $\mathcal {P} $ lead to different intermediate values $w \ne w'$ (and $\mathcal {P} $ is deterministic).

Perspective. This example is representative of how second preimages can be computed in arbitrary Linicrypt programs. Given an input $\varvec{x}$ for $\mathcal {P} ^H$, we compute a second preimage $\varvec{x}'$ by focusing on the oracle queries that $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$ will make:

1.
Designate some of the oracle queries to take the same values in both $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$. In our example, we decided that the oracle query H(z) would take the same values in both computations.
2.
Identify the first query that we will assign different values in the two computations. Set the input to this query arbitrarily in $\mathcal {P} ^H(\varvec{x}')$. In our example, we identify the H(w) query to take on different values and set $w' \ne w$ arbitrarily.
3.
Repeatedly make followup oracle queries as they become possible, while using linear algebra to solve for other intermediate values. In our example, we call $H(w')$, which allows us to solve for $x'$, which allows us to call $H(x')$, which allows us to solve for $y'$.

3.3 Collision Structures for Finding Second Preimages

We have given a rough outline of how (we claim) Linicrypt second preimages must be found. The next step is to formalize what is required of $\mathcal {P} $ in terms of its algebraic representation.

In step 2 above, we identify a query whose input will be chosen arbitrarily. Suppose that query corresponds to constraint $(t,\varvec{q},\varvec{a})$. Since this is the first value that is fixed differently in $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$, we must have $\varvec{q}$ linearly independent of the vectors that are already fixed by step 1. Otherwise it would not be possible to find two consistent values for this query.

In steps 2 and 3 above, we repeatedly query H, and we have written the attack outline to suggest we never get “stuck.” One way we could get stuck is to make some query $H(x')$ for the first time, when we have already fixed (either directly or indirectly) what $H(x')$ must be. If this is the case, then we cannot succeed with probability better than $1/|\mathbb {F} _\lambda |$. To avoid this case, every query we make in steps 2 & 3 of the outline must correspond to a constraint $(t, \varvec{q},\varvec{a})$ where $\varvec{a}$ is linearly independent of the values that have already been fixed.

The following definition formalizes these algebraic intuitions:

Definition 6

Let $\mathcal {P} = (\varvec{M},\mathcal {C})$ be a Linicrypt program. A collision structure for $\mathcal {P} $ is a tuple $(i^*; c_1, \ldots , c_n)$, where:

1.
$c_1, \ldots , c_n$ is an ordering of $\mathcal {C} $, and we write $c_i = (t_i, \varvec{q}_i, \varvec{a}_i)$.
2.
$\varvec{q}_{i^*} \not \in \textsf {span}\Big ( \{ \varvec{q}_1, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M}) \Big )$
3.
For $j \ge i^*$: $\varvec{a}_j \not \in \textsf {span}\Big ( \{ \varvec{q}_1, \ldots , \varvec{q}_{j} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{j-1} \} \cup \textsf {rows}(\varvec{M}) \Big )$

Connecting to the previous intuition, a collision-finding attack will let oracle queries $c_1, \ldots , c_{i^*-1}$ be the same in both executions $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$. Then $c_{i^*}$ is the first oracle query that the attack fixes differently for the two executions. Property (2) of the definition ensures that it is possible to find 2 query values that are consistent with the previously fixed values. Property (3) captures the fact that from this point forward, no query should be forced to result in an output value that has already been fixed.

Running Example. We now revisit the running example from before, to illustrate a collision structure for it. The base variables of this program are x, y, z, H(x), H(z), H(w). Below is the algebraic representation of this program, with the oracle constraints arranged to show a collision structure (we do not write the empty nonces of the oracle constraints):

This ordering of queries is indeed a collision structure since:

$\varvec{q}_2$ is linearly independent of all vectors above it in this diagram.
$\varvec{a}_2$ is linearly independent of all vectors above it in this diagram.
$\varvec{a}_3$ is linearly independent of all vectors above it in this diagram.

Second-Preimage-Finding Algorithm. In Fig. 1 we give an algorithm that finds second preimages by following the intuitive strategy above, from a given collision structure.

Lemma 7

If a collision structure $(i^*; c_1, \ldots , c_n)$ exists for $\mathcal {P} $, and $\mathcal {P} $ is not degenerate, then the second-preimage resistance of $\mathcal {P} $ is comprehensively broken. Specifically, let $\mathcal {A} $ refer to $\textsf {FindSecondPreimage}(\mathcal {P}, (i^*;c_1, \ldots , c_n), \cdot )$. Then:

$$ \Pr \Big [ \textsf {2PIGame}(\mathcal {P}, \mathcal {A}, \lambda ) = 1 \Big ] = 1 $$

Proof

Given $\varvec{x}$, the goal is to compute a second preimage $\varvec{x}'$. The computation of $\mathcal {P} ^H(\varvec{x}')$ has a certain set of base variables $\varvec{v}'$, and it suffices to compute those instead since $\varvec{x}' = (\varvec{e}_1\cdot \varvec{v}', \ldots , \varvec{e}_k \cdot \varvec{v}')$. The attack $\textsf {FindSecondPreimage}$ fixes one linear constraint of $\varvec{v}'$ at a time, until $\varvec{v}'$ is completely determined.

It suffices to show the following about the behavior of $\textsf {FindSecondPreimage}$:

1.
It computes a different set of base variables $\varvec{v}'$ than those of $\mathcal {P} ^H(\varvec{x})$.
2.
It never adds incompatible (unsatisfiable) linear constraints on $\varvec{v}'$.
3.
Values $\varvec{v}'$ are consistent with H. Namely, if $(t,\varvec{q},\varvec{a}) \in \mathcal {C} $, then $H(t; \varvec{q}\cdot \varvec{v}') = \varvec{a}\cdot \varvec{v}'$.
4.
By the end of the computation, enough constraints have been added to completely determine $\varvec{v}'$.

Property 1 holds since $\varvec{q}_{i^*}\cdot \varvec{v}\ne \varvec{q}_{i^*} \cdot \varvec{v}'$ by design. Regarding property 2:

The constraints on $\varvec{v}'$ that are added for $\varvec{M} $ and in the first for-loop are self-consistent—by construction they already have a valid solution in $\varvec{v}$.
The constraint involving $\varvec{q}_{i^*}$ is compatible with the previous constraints since $\varvec{q}_{i^*}$ is linearly independent of the previous constraint vectors $\{ \varvec{q}_1, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M})$, by the collision structure property.
Similarly, a constraint involving $\varvec{q}_i$ for $i \ge i^*$ (if-statement within last for-loop) is only added in the case that $\varvec{q}_i$ is linearly independent of the previous constraint vectors.
The constraint involving $\varvec{a}_i$ in the second for-loop is consistent since $\varvec{a}_i$ is linearly independent of existing constraint vectors, again by the collision structure property.

Regarding property 3: for oracle constraints $c_i$ with $i < i^*$, consistency with H is ensured by agreeing with the existing values $\varvec{v}$. For constraints $c_i$ with $i \ge i^*$, consistency is guaranteed since the second for-loop actually calls H to determine the consistent way to constrain $\varvec{a}_i \cdot \varvec{v}'$.

Property 4 follows from the fact that $\mathcal {P} $ is not degenerate. We can see that $\varvec{M} \times \varvec{v}'$ and $\varvec{q}\cdot \varvec{v}'$ are fixed/determined by the end of the computation, for all $(t,\varvec{q},\varvec{a}) \in \mathcal {C} $. Non-degeneracy implies that the input of $\mathcal {P} $ (and hence all base variables) is uniquely determined.

3.4 Efficiently Finding Collision Structures

In this section we show that it is possible to efficiently determine whether a Linicrypt program has a collision structure, by analyzing its algebraic representation. The algorithm for finding a collision structure is given in Fig. 2.

Lemma 8

$\textsf {FindColStruct}(\mathcal {P})$ (Fig. 2) outputs a collision structure for $\mathcal {P} $ if and only if one exists. Furthermore, the running time of $\textsf {FindColStruct}$ is polynomial (in the size of $\mathcal {P} $’s algebraic representation).

In the interest of space, the proof is deferred to Appendix A.

3.5 Breaking Collision Resistance Implies Collision Structure

So far our discussion has centered around the relationship between collision structures and second-preimage resistance. We now show that if $\mathcal {P} $ fails to be even collision resistant (in the random oracle model), then it has a collision structure. The main approach is to observe the oracle queries made by an arbitrary attacker (who computes a collision), and “extract” a collision structure from these queries.

The results in this subsection hold only for the following subclass of Linicrypt programs. In Sect. B.2 we discuss specifically why the results are restricted to this subclass.

Definition 9

Let $\mathcal {P} = (\varvec{M},\mathcal {C})$ be a Linicrypt program, with $\mathcal {C} = \{ (t_1, \varvec{q}_1, \varvec{a}_1), \ldots , (t_n, \varvec{q}_n, \varvec{a}_n)\}$. If all of $\{t_1, \ldots , t_n\}$ are distinct then we say that $\mathcal {P} $ has distinct nonces.

Lemma 10

Let $\mathcal {P} $ be a deterministic Linicrypt program with distinct nonces that makes n oracle queries. Let $\mathcal {A} $ be an oracle program that makes at most N oracle queries. If

$$\begin{aligned} \Pr [ \textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1]&> \left( \frac{N}{n}\right) ^{2n}/ |\mathbb {F} _\lambda | \\ \text{ or } \text{ if } \Pr [ \textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1]&> \left( \frac{N}{n}\right) ^{n}/ |\mathbb {F} _\lambda | \end{aligned}$$

then $\mathcal {P} $ either has a collision structure or is degenerate.

Proof

Without loss of generality, we can assume the following about $\mathcal {A} $:

Let $(\varvec{x},\varvec{x}')$ be the two preimages from the games (in 2PIGame $\mathcal {A} $ gets $\varvec{x}$ as input and gives $\varvec{x}'$ as output; in ColGame $\mathcal {A} $ outputs both $\varvec{x}$ and $\varvec{x}'$). We assume that $\mathcal {A} ^H$ has made the oracle queries that $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$ will make. In ColGame this can be achieved by modifying $\mathcal {A} $ to run these two computations as its last action. In 2PIGame this can be achieved by having $\mathcal {A} $ run $\mathcal {P} ^H(\varvec{x})$ as its first action and $\mathcal {P} ^H(\varvec{x}')$ as its last action.
$\mathcal {A} $ never repeats a query to H. This can be achieved by simple memoization. Note that when $\mathcal {A} $ runs, say, $\mathcal {P} ^H(\varvec{x}')$ as its last action, some of those oracle queries may have been made previously.
$\mathcal {A} ^H$ can actually output $(\varvec{v},\varvec{v}')$, where $\varvec{v}$ is the set of base variables in the computation of $\mathcal {P} ^H(\varvec{x})$, and $\varvec{v}'$ the base variables in $\mathcal {P} ^H(\varvec{x}')$. This is because the base variables are computed during the process of running $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$.

Note that the base variables have the following property. Let $c = (t,\varvec{q},\varvec{a})$ be one of the oracle constraints of $\mathcal {P} $. Then the computation $\mathcal {P} ^H(\varvec{x})$ (and hence $\mathcal {A} ^H$ as well) at some point makes an oracle query $H(t,\varvec{q}\cdot \varvec{v})$ and gets a response $\varvec{a}\cdot \varvec{v}$.

From these assumptions, whenever $\mathcal {A} $ outputs a successful collision there exist well-defined mappings $T,T' : \mathcal {C} \rightarrow \mathbb {N}$ such that:

For every constraint $c = (t,\varvec{q},\varvec{a}) \in \mathcal {C} $, the T(c)th query made by $\mathcal {A} ^H$ is the one corresponding to oracle constraint c in the computation of $\mathcal {P} ^H(\varvec{x})$. In other words, it is the query in which $\mathcal {A} ^H$ “decided” what $\varvec{q}\cdot \varvec{v}$ should be (and learned what $\varvec{a}\cdot \varvec{v}$ was as a result of the query).
Similarly, the $T'(c)$th query made by $\mathcal {A} ^H$ is the one corresponding to oracle constraint c in the computation of $\mathcal {P} ^H(\varvec{x}')$. This is the query in which $\varvec{q}\cdot \varvec{v}'$ was determined.

How many possible mappings $(T,T')$ are there if $\mathcal {A} $ makes N oracle queries? Let $N_i$ be the number of oracle queries that $\mathcal {A} $ makes which have nonce $t_i$. Since the nonces are distinct, we have $\sum _i N_i \le N$. There are only $N_i$ choices for how T or $T'$ can map $T(c_i)$. Hence there are at most $ \prod _{i=1}^n N_i^2$ possible $(T,T')$ mappings. However, in the 2PIGame, the mapping T is completely fixed since we assume $\mathcal {A} $ performs the computation $\mathcal {P} ^H(\varvec{x})$ as its first action. In that case, there are only $\prod _{i=1}^n N_i$ choices of the mapping $T'$. These products are maximized when each $N_i = N/n$, so we get an upper bound of $(N/n)^{2n}$ possible $(T,T')$ mappings in the ColGame and $(N/n)^n$ mappings in the 2PIGame.

Applying the pigeonhole principle and uniting both cases from the statement of the lemma (collision game and second preimage game), there is a specific $(T,T')$ such that:

$$ \Pr [\mathcal {A} ^H \text{ outputs } \text{ a } \text{ valid } \text{ collision } \text{ while } \text{ using } \text{ mappings } (T,T')] > 1/|\mathbb {F} _\lambda | $$

For the rest of the proof, we condition on the event that $\mathcal {A} $ computes a collision while using this specific mapping $(T,T')$. This is without loss of generality by making $\mathcal {A} $, as its final action, output $\bot $ if it observes that some different mapping is used. Hence we can view the association between oracle calls of $\mathcal {P} $ and $\mathcal {A} $ as fixed a priori. That is, we can know in advance that a particular oracle call of $\mathcal {A} $ will determine the value of $\varvec{q}\cdot \varvec{v}$ (or $\varvec{q}\cdot \varvec{v}'$) for a specific $\varvec{q}$.

For some $c \in \mathcal {C} $, if $T(c) = T'(c)$, then we call c convergent. In this case, $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$ make the same c-query and receive the same output. In other words, under such a mapping $T,T'$, adversary $\mathcal {A} ^H$ will choose that $\varvec{q}\cdot \varvec{v}= \varvec{q}\cdot \varvec{v}'$. If $T(c) \ne T'(c)$, we call c divergent—$\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$ make different c-queries, i.e., $\varvec{q}\cdot \varvec{v}\ne \varvec{q}\cdot \varvec{v}'$.

If all $c \in \mathcal {C} $ are convergent, then two distinct inputs $\varvec{x}$ and $\varvec{x}'$ cause $\mathcal {P} $ to make identical oracle queries and give identical output. Hence $\mathcal {P} $ is degenerate, and we are done. We continue assuming that some query is divergent, and will conclude that $\mathcal {P} $ has a collision structure.

Define $\textsf {finish}(c) = \max \{ T(c), T'(c) \}$. Note that since $\mathcal {P} $ has distinct nonces, an oracle query made by $\mathcal {A} $ cannot be associated with more than one $c \in \mathcal {C} $. Hence $\textsf {finish}$ is an injective function.

We obtain a collision structure for $\mathcal {P} $ as follows. Order the oracle constraints in $\mathcal {C} $ as $(c_1, \ldots , c_n)$, where all of the convergent queries come first, followed by the divergent queries ordered by increasing finish time. Let $i^*$ be the index of the divergent query with earliest finish time. Then:

$i^* \le i$ $\Leftrightarrow $ $c_i$ is divergent
$i^* \le i < j$ $\Leftrightarrow $ $\textsf {finish}(i) < \textsf {finish}(j)$

Claim

$( i^* ; c_1, \ldots , c_n)$ is a collision structure for $\mathcal {P} $.

In the following, we write each oracle constraint $c_i$ as $c_i = (t_i, \varvec{q}_i, \varvec{a}_i)$.

For $j < i^*$, the query $c_j$ is convergent so we have $\varvec{q}_{j} \cdot \varvec{v}= \varvec{q}_{j} \cdot \varvec{v}'$ and $\varvec{a}_{j} \cdot \varvec{v}= \varvec{a}_{j} \cdot \varvec{v}'$. Since the outputs of the two executions of $\mathcal {P} $ are also identical, we also have $\varvec{M} \varvec{v}= \varvec{M} \varvec{v}'$. Since $c_{i^*}$ is divergent, we have $\varvec{q}_{i^*} \cdot \varvec{v}\ne \varvec{q}_{i^*} \cdot \varvec{v}'$. From this we conclude that:

$$\varvec{q}_{i^*} \not \in \textsf {span}\Big ( \{ \varvec{q}_{1}, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_{1}, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M}) \Big ).$$

This is the first property required of a collision structure.

It remains to show that for all $i > i^*$,

$$ \varvec{a}_{i} \not \in \textsf {span}\Big ( \{ \varvec{q}_{1}, \ldots , \varvec{q}_{i} \} \cup \{ \varvec{a}_{1}, \ldots , \varvec{a}_{i-1} \} \cup \textsf {rows}(\varvec{M}) \Big ). $$

Suppose for contradiction that the above is false, and that we actually have:

$$ \varvec{a}_i = \sum _{j\le i} \alpha _j \varvec{q}_j + \sum _{j < i} \beta _j \varvec{a}_j + \varvec{\gamma } \varvec{M} $$

Focus on the moment when $\mathcal {A} $ has asked its $\textsf {finish}(c_i)$th query and is awaiting the response from H. By symmetry, suppose $\textsf {finish}(c_i) = T'(c_i)$, so that this query is on $\varvec{q}_i \cdot \varvec{v}'$; the result of the query will be assigned to $\varvec{a}_i \cdot \varvec{v}'$. At this moment:

All queries $c_j$ for $i^* \le j < i$ are finished. This means that the oracle queries of $\mathcal {A} ^H$ have already determined $\varvec{q}_j \cdot \varvec{v}$, $\varvec{a}_j \cdot \varvec{v}$, $\varvec{q}_j \cdot \varvec{v}'$, and $\varvec{a}_j \cdot \varvec{v}'$. Further, the queries (but not responses) of oracle constraint $c_i$ have been fixed as well—these values are $\varvec{q}_i \cdot \varvec{v}$ and $\varvec{q}_i \cdot \varvec{v}'$.
$\varvec{a}_i \cdot \varvec{v}$ has already been fixed, since this happened at time $T(c_i) < T'(c_i)$. But $\varvec{a}_i \cdot \varvec{v}'$ is about to be chosen as a uniform field element.

Now consider the expression $\varvec{a}_i \cdot (\varvec{v}' - \varvec{v})$:

$$ \varvec{a}_i \cdot (\varvec{v}'-\varvec{v}) = \sum _{j\le i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j < i} \beta _j \varvec{a}_j\cdot (\varvec{v}'-\varvec{v}) + \varvec{\gamma } \varvec{M} (\varvec{v}'-\varvec{v}) $$

For $j < i^*$ we know that query $c_j$ is convergent. This implies that $\varvec{q}_j \cdot (\varvec{v}' - \varvec{v})=0$ and $\varvec{a}_j\cdot (\varvec{v}'-\varvec{v}) =0$. We also know that $\varvec{M} (\varvec{v}'-\varvec{v}) = 0$, in the case that $\mathcal {A} ^H$ is successful generating a collision. Cancelling these terms gives:

$$ \varvec{a}_i \cdot (\varvec{v}'-\varvec{v}) = \sum _{j= i^*}^{ i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j =i^*}^{i-1} \beta _j \varvec{a}_j \cdot (\varvec{v}'-\varvec{v}) $$

Isolating $\varvec{a}_i \cdot \varvec{v}'$ gives:

$$ \varvec{a}_i \cdot \varvec{v}' = - \varvec{a}_i \cdot \varvec{v}+ \sum _{j= i^*}^{ i} \alpha _j \varvec{q}_j \cdot (\varvec{v}'-\varvec{v}) + \sum _{j =i^*}^{i-1} \beta _j \varvec{a}_j \cdot (\varvec{v}'-\varvec{v}) $$

But all terms on the right-hand side have already been fixed, while the term on the left is chosen uniformly in $\mathbb {F} $. So equality holds with probability $1/|\mathbb {F} _\lambda |$. This contradicts the assumption that $\mathcal {A} $ succeeds with strictly greater probability.

3.6 Putting Everything Together

Our main characterization shows that second-preimage resistance and collision resistance coincide for this class of Linicrypt programs, in a very strong sense:

Theorem 11

Let $\mathcal {P}$ be a deterministic Linicrypt program with distinct nonces, making n oracle queries. Then the following are equivalent:

1.
There is an adversary $\mathcal {A} $ making N oracle queries such that
$$\Pr [ \textsf {ColGame}(\mathcal {P},\mathcal {A},\lambda )=1] > \left( \frac{N}{n}\right) ^{2n}/ |\mathbb {F} _\lambda |.$$
2.
There is an adversary $\mathcal {A} $ making N oracle queries such that
$$\Pr [ \textsf {2PI}(\mathcal {P},\mathcal {A},\lambda )=1] > \left( \frac{N}{n}\right) ^{n}/ |\mathbb {F} _\lambda |.$$
3.
There is an adversary $\mathcal {A} $ making at most 2n oracle queries such that
$$\Pr [ \textsf {2PIGame}(\mathcal {P},\mathcal {A},\lambda )=1] = 1.$$
4.
$\mathcal {P} $ either has a collision structure or is degenerate

Corollary 12

The collision resistance (equivalently, second-preimage resistance) of deterministic, distinct-nonce Linicrypt programs $\mathcal {P} $ can be decided in polynomial time (in the size of $\mathcal {P} $’s algebraic representation).

Proof

Using standard linear algebraic operations (e.g., Gaussian elimination), one can check $\mathcal {P} $ for degeneracy or for the existence of a collision structure in polynomial time.

4 A Simple Application

We can illustrate the use of our main theorem with a simple example application. Suppose we have access to a random oracle which is compressing by a factor of 2-to-1. In the Linicrypt notation, this would be an oracle that takes 2 field elements (and the oracle nonce) as input and produces one field element as output—$H:\{0,1\}^* \times \mathbb {F} ^2 \rightarrow \mathbb {F} $. If we require a collision resistant function that compresses by k-to-1 (for some fixed k), the following natural Merkle-Damgård-style iterative hash comes to mind:

The algebraic representation of this program is:

We have numbered the oracle constraints so that constraint $(i, \varvec{Q}_i, \varvec{a}_i)$ corresponds to the statement “$y_i := H(i; y_{i-1}, x_i)$” in $\mathcal {P} $.

To determine whether this program is collision-resistant, we execute the FindColStruct algorithm.^{Footnote 1} Initially all oracle constraints start in the set $\textsf {LEFT}$, and $\textsf {RIGHT}$ starts out empty. The first loop in FindColStruct moves oracle constraints from $\textsf {LEFT}$ to $\textsf {RIGHT}$ whenever their $\varvec{a}_i$ value is linearly independent of all other vectors appearing in $\textsf {LEFT}$ (the multiset of vectors is represented as the variable V in FindColStruct).

In this program, every $\varvec{a}_i$ vector is zeroes everywhere except for a 1 corresponding to the “$y_i$” column. Also note that $\varvec{a}_k$ is identical to $\varvec{M} $, and $\varvec{a}_i$ (for $i<k$) appears as the first row of $\varvec{Q}_{i+1}$ (see the example with $\varvec{a}_2$ and $\varvec{Q}_3$ above). In other words, every $\varvec{a}_i$ is always in the span of other vectors appearing in $\textsf {LEFT}$, so no oracle constraint will ever be added to $\textsf {RIGHT}$.

Hence, $\textsf {FindColStruct}{}$ will terminate with $\textsf {RIGHT}=\emptyset $ and return $\bot $. From our main characterization, this proves that the function is collision-resistant.

5 Extensions, Limitations, Future Work

In Appendix B we discuss several extensions and limitations of our techniques:

How the results generalize to oracle calls that take several field elements as input (results as stated in previous sections consider a random oracle of the form $H: \{0,1\}^* \times \mathbb {F} \rightarrow \mathbb {F} $).
Why the restriction to distinct nonces is significant, and how repeated nonces make the picture more complicated.
Extending the work to support the ideal cipher model instead of the random oracle model.

Notes

1.
Look ahead to Appendix B.1 to see how the characterization and FindColStruct are modified to support a random oracle with arity 2, as we have in this case.

References

Bellare, M., Micciancio, D.: A new paradigm for collision-free hashing: incrementality at reduced cost. In: Fumy, W. (ed.) EUROCRYPT 1997. LNCS, vol. 1233, pp. 163–192. Springer, Heidelberg (1997). https://doi.org/10.1007/3-540-69053-0_13
Chapter Google Scholar
Bernstein, D.J., et al.: SPHINCS: practical stateless hash-based signatures. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015. LNCS, vol. 9056, pp. 368–397. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_15
Chapter Google Scholar
Black, J., Rogaway, P., Shrimpton, T.: Black-box analysis of the block-cipher-based hash-function constructions from PGV. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 320–335. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45708-9_21
Chapter Google Scholar
Bos, J.N.E., Chaum, D.: Provably unforgeable signatures. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 1–14. Springer, Heidelberg (1993). https://doi.org/10.1007/3-540-48071-4_1
Chapter Google Scholar
Carmer, B., Rosulek, M.: Linicrypt: a model for practical cryptography. In: Robshaw, M., Katz, J. (eds.) CRYPTO 2016. LNCS, vol. 9816, pp. 416–445. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53015-3_15
Chapter Google Scholar
Dahmen, E., Okeya, K., Takagi, T., Vuillaume, C.: Digital signatures out of second-preimage resistant hash functions. In: Buchmann, J., Ding, J. (eds.) PQCrypto 2008. LNCS, vol. 5299, pp. 109–123. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88403-3_8
Chapter Google Scholar
Even, S., Goldreich, O., Micali, S.: On-line/off-line digital schemes. In: Brassard, G. (ed.) CRYPTO’89. LNCS, vol. 435, pp. 263–275. Springer, Heidelberg (1990)
Google Scholar
Hülsing, A.: W-OTS+ - shorter signatures for hash-based signature schemes. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 13. LNCS, vol. 7918, pp. 173–188. Springer, Heidelberg (2013)
Google Scholar
Impagliazzo, R.: A personal view of average-case complexity. In: Proceedings of the Tenth Annual Structure in Complexity Theory Conference, Minneapolis, Minnesota, USA, 19–22 June, 1995, pp. 134–147. IEEE Computer Society (1995)
Google Scholar
Lamport, L.: Constructing digital signatures from a one-way function. Technical report SRI-CSL-98, SRI International Computer Science Laboratory, October 1979
Google Scholar
Merkle, R.C.: A digital signature based on a conventional encryption function. In: Pomerance, C. (ed.) CRYPTO 1987. LNCS, vol. 293, pp. 369–378. Springer, Heidelberg (1988). https://doi.org/10.1007/3-540-48184-2_32
Chapter Google Scholar
Preneel, B., Govaerts, R., Vandewalle, J.: Hash functions based on block ciphers: a synthetic approach. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 368–378. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48329-2_31
Chapter Google Scholar
Wagner, D.: A generalized birthday problem. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 288–304. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45708-9_19
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Oregon State University, Corvallis, USA
Ian McQuoid, Trevor Swope & Mike Rosulek

Authors

Ian McQuoid
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Swope
View author publications
You can also search for this author in PubMed Google Scholar
Mike Rosulek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ian McQuoid .

Editor information

Editors and Affiliations

Karlsruhe Institute of Technology, Karlsruhe, Germany
Dennis Hofheinz
IDC Herzliya, Herzliya, Israel
Alon Rosen

Appendices

A Proofs

Proof

(Proof of Lemma 8). Some useful invariants in $\textsf {FindColStruct}$ are that at any time, $\textsf {LEFT}\cup \textsf {RIGHT}= \mathcal {C} $ and V is a multiset of the vectors appearing in $\textsf {rows}(\varvec{M})$ and $\textsf {LEFT}$. Note that $\textsf {FindColStruct}$ works in two phases: it starts with all oracle queries in $\textsf {LEFT}$ and in the first phase moves some to $\textsf {RIGHT}$. In the second phase, it moves some of the oracle queries back into $\textsf {LEFT}$.

($\Rightarrow $) First, we argue that if $\textsf {FindColStruct}(\mathcal {P}) = (i^*; c_1, \ldots , c_n) \ne \bot $, then this output is indeed a collision structure. Write each oracle constraint $c_i$ as $c_i = (t_i, \varvec{q}_i, \varvec{a}_i)$.

At the time the second while-loop terminates, we must have $\varvec{q}_{i^*} \not \in \textsf {span}(V)$ since otherwise $c_{i^*}$ would have been moved to $\textsf {LEFT}$. But $V = \{ \varvec{q}_1, \ldots , \varvec{q}_{i^*-1} \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M})$, so this establishes one of the required properties of a collision structure.
For $j \ge i^*$, consider the time at which $c_j$ is about to be added to $\textsf {RIGHT}$ in the first while-loop (i.e., the point that the while loop body is entered). At that point, $\textsf {LEFT}= \{ c_1, \ldots , c_j\}$, so V contains $\{\varvec{q}_1, \ldots , \varvec{q}_j \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_j \} \cup \textsf {rows}(\varvec{M})$. Since the while-loop condition is fulfilled, we have
$$\begin{aligned} \varvec{a}_j&\not \in \textsf {span}(V \setminus \{\varvec{a}_j\}) = \textsf {span}\Big ( \{\varvec{q}_1, \ldots , \varvec{q}_j \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{j-1} \} \cup \textsf {rows}(\varvec{M}) \Big ) \end{aligned}$$
which is the other condition required for a collision structure.

($\Leftarrow $) For the other direction, suppose $(i^*, c_1, \ldots , c_n)$ is some collision structure for $\mathcal {P} $. We will show that the algorithm adds $c_{i^*}, \ldots , c_n$ to $\textsf {RIGHT}$ in the first phase, but does not move $c_{i^*}$ back to $\textsf {LEFT}$ in the second phase. This implies that the algorithm terminates with $|\textsf {RIGHT}| \ne \emptyset $, so by the previous reasoning it outputs some valid collision structure (perhaps different than the collision structure we are assuming exists).

The fact that $c_{i^*}, \ldots , c_n$ are added to $\textsf {RIGHT}$ in the first phase is essentially the converse of what was shown above. For example, the collision structure property is that $\varvec{a}_n \not \in \textsf {span}\Big ( \{\varvec{q}_1, \ldots , \varvec{q}_n \} \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{n-1} \} \cup \textsf {rows}(\varvec{M}) \Big )$, implying that $c_n$ can trigger the while-loop and be added to $\textsf {RIGHT}$ immediately. Note that even if other constraints are added to $\textsf {RIGHT}$ in this phase, it only makes V smaller, so only causes the condition to check a smaller span than in the collision-property definition. A simple inductive argument establishes that $c_{i^*}, \ldots , c_n$ are eventually added to $\textsf {RIGHT}$.

Since $\{ c_{i^*}, \ldots , c_n \} \subseteq \textsf {RIGHT}$ after the first phase, we must have $\textsf {LEFT}\subseteq \{ c_1, \ldots , c_{i^*-1} \}$ after the first phase. We want to show that $c_{i^*}$ is never placed back into $\textsf {LEFT}$. For the sake of contradiction, suppose not. Define S to be a set of indices such that $\textsf {LEFT}= \{ c_i \mid i \in S \}$ at the time $c_{i^*}$ is about to be moved into $\textsf {LEFT}$. Then $\varvec{q}_{i^*} \in \textsf {span}( \textsf {rows}(\varvec{M}) \cup \{ \varvec{q}_i, \varvec{a}_i \mid i \in S \} )$. We can then write:

$$ \varvec{q}_{i^*} = \sum _{j \in S} \alpha _j \varvec{q}_j + \sum _{j \in S} \beta _j \varvec{a}_j + \varvec{\gamma }\varvec{M} $$

For $j > i^*$, the constraint $c_j$ was previously in $\textsf {RIGHT}$ and was moved back into $\textsf {LEFT}$. The only way to be moved back into $\textsf {LEFT}$ is for $\varvec{q}_j$ to be in the span of other vectors already in $\textsf {LEFT}$ (and hence already on the right-hand side of this expression). Hence, without loss of generality we can remove the terms involving $\varvec{q}_j$ for $j > i^*$, to obtain:

$$ \varvec{q}_{i^*} = \sum _{j \in S \setminus \{ i^*, \ldots , n\}} \alpha '_j \varvec{q}_j + \sum _{j \in S} \beta '_j \varvec{a}_j + \varvec{\gamma }' \varvec{M} $$

Let $j^*$ be the highest $j \in S$ for which $\beta '_j \ne 0$. There are two cases.

Case $j^* < i^*$: Then all of the nonzero terms $\varvec{q}_j, \varvec{a}_j$ on the right-hand side have subscript less than $i^*$. This contradicts the fact (from the original collision structure) that $\varvec{q}_{i^*} \not \in \textsf {span}( \textsf {rows}(\varvec{M}) \cup \{ \varvec{q}_j, \varvec{a}_j \mid j < i^*\} )$.

Case $j^* > i^*$: We can solve for $\varvec{a}_{j^*}$ in the above expression, yielding:

$$ \varvec{a}_{j^*} = -\frac{1}{\beta '_{j^*}}\left( \sum _{j \in S \setminus \{ i^*, \ldots , n\}} \alpha '_j \varvec{q}_j - \varvec{q}_{i^*} + \sum _{j \in S \setminus \{ j^* \}} \beta '_j \varvec{a}_j + \varvec{\gamma }' \varvec{M} \right) $$

But now all nonzero $\varvec{q}_j$ and $\varvec{a}_j$ terms on the right-hand side have subscript less than $j^*$. This contradicts the fact (from the original collision structure) that $\varvec{a}_{j^*} \not \in \textsf {span}( \{ \varvec{q}_j \mid j< j^* \} \cup \{ \varvec{a}_j \mid j < j^*\} \cup \textsf {rows}(\varvec{M}) )$.

In either case we have a contradiction to the claim that $c_{i^*}$ is moved back into $\textsf {LEFT}$. Since the algorithm terminates with at least $c_{i^*} \in \textsf {RIGHT}$, it outputs some valid collision structure.

B Extensions, Limitations, Future Work

1.1 B.1 Generalizing to Higher Arity

For simplicity our results were proven for Linicrypt programs in which all oracle calls have arity 1. That is, $H : \{0,1\}^* \times \mathbb {F} \rightarrow \mathbb {F} $, and all oracle constraints have the form $(t,\varvec{q},\varvec{a})$ where $\varvec{q}$ is a single row. This reflects a program that always queries the oracle as H(t; v) where v is a single field element.

More generally, Linicrypt allows calls to H with multiple field elements as arguments. This leads to oracle constraints of the form $(t,\varvec{Q},\varvec{a})$ where $\varvec{Q}$ is now a matrix. We briefly discuss the changes necessary to support such programs. Basically, whenever the definitions (of degeneracy & collision structure) or algorithms (to find a second preimage or to find a collision structure) refer to $\varvec{q}$, the analogous condition should hold with respect to all rows of $\varvec{Q}$.

The generalized definition of degeneracy (Definition 4) is that:

$$ \textsf {span}( \varvec{e}_1, \ldots , \varvec{e}_k ) \not \subseteq \textsf {span}\Big ( \bigcup _{(t,\varvec{Q},\varvec{a}) \in \mathcal {C}} \textsf {rows}(\varvec{Q})\cup \textsf {rows}(\varvec{M}) \Big ) $$

The generalized definition of collision structure (Definition 6) requires the following change:

2.
$\textsf {rows}(\varvec{Q}_{i^*}) \not \subseteq \textsf {span}\Big ( \textsf {rows}(\varvec{Q}_1) \cup \cdots \cup \textsf {rows}(\varvec{Q}_{i^*-1}) \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{i^*-1} \} \cup \textsf {rows}(\varvec{M}) \Big )$
3.
For $j \ge i^*$: $\varvec{a}_j \not \in \textsf {span}\Big ( \textsf {rows}(\varvec{Q}_1) \cup \cdots \cup \textsf {rows}(\varvec{Q}_{j}) \cup \{ \varvec{a}_1, \ldots , \varvec{a}_{j-1} \} \cup \textsf {rows}(\varvec{M}) \Big )$

Specifically, for item (2) it is enough if any row of $\varvec{Q}_{i^*}$ is not in the given span.

In the $\textsf {FindSecondPreimage}$ algorithm (Fig. 1), there are times when the algorithm chooses $\varvec{q}_j \cdot \varvec{v}'$ arbitrarily. This happens when such a constraint would be linearly independent of the existing constraints on $\varvec{v}'$. In the analogous generalized case, we might have only some of the rows of $\varvec{Q}_j$ linearly independent of the existing constraints. In that case, some of the components of $\varvec{Q}_j \times \varvec{v}'$ are already fixed. We obviously cannot choose these arbitrarily—only the unconstrained positions in $\varvec{Q}_j \times \varvec{v}'$ are fixed arbitrarily. One can verify that the algorithm only attempts to arbitrarily fix some values if there is some row of $\varvec{Q}_j$ linearly independent with existing constraints on $\varvec{v}'$.

In the $\textsf {FindColStruct}$ algorithm (Fig. 2) we let V now contain $\varvec{Q}$-matrices as well as simple $\varvec{a}$-vectors. Then we overload notation so that $\textsf {span}(V)$ considers the span of all of the rows of all matrices/vectors in V. The second “while” condition is modified as follows:

$$ \text{ while } \exists (t,\varvec{Q},\varvec{a}) \in \textsf {RIGHT} \text{ such } \text{ that } \textsf {rows}(\varvec{Q}) \subseteq \textsf {span}(V) $$

In other words, $(t,\varvec{Q},\varvec{a})$ is moved from LEFTto RIGHTif all rows of $\varvec{Q}$ are spanned by V.

With these modifications, all proofs in Sect. 3 go through with straight-forward modifications.

1.2 B.2 Why the Restriction to Distinct Nonces?

The main characterization holds for Linicrypt programs with distinct nonces. It is instructive to understand why the results are limited in this way. Specifically, where do we use the property of distinct nonces?

Suppose $\mathcal {A} $ breaks the collision-resistance of $\mathcal {P} $. We observe the oracle queries made by $\mathcal {A} $ and obtain a mapping between these queries and the ones made in $\mathcal {P} ^H(\varvec{x})$ and $\mathcal {P} ^H(\varvec{x}')$. When the nonces are distinct, a query made by $\mathcal {A} $ can only be associated with a unique oracle constraint $c \in \mathcal {C} $. When the nonces are not distinct, a single query of $\mathcal {A} $ can serve double-duty and correspond to two oracle constraints of $\mathcal {P} $. This indeed causes the argument to break down.

We illustrate with the two example Linicrypt programs:

The first has distinct nonces and is indeed collision resistant (it has no collision structure). The second program is not collision-resistant, because $\mathcal {P} _2^H(x, H(x)) = 0$ for all x. In other words, (x, H(x)) and $(x',H(x'))$ constitute a collision.

When given inputs of this form, $\mathcal {P} _2$ makes duplicate queries—both H(H(x)) (the outermost H-call) and H(y) receive the same argument. In our previous proofs, we would observe the adversary making such a query, which would have to be associated with two distinct oracle constraints.

Another way of seeing what happens is that in the algebraic representation of $\mathcal {P} _2$, the base variables H(x) and y correspond to independent vectors. In this case, the adversary’s choice of inputs causes these vectors to coincide, and this has the effect of “collapsing” two oracle queries.

Interestingly, it is possible to give an ad-hoc argument that $\mathcal {P} _2$ is second-preimage resistant. When x and y are chosen uniformly, this has the effect of keeping the vectors (in the algebraic representation) corresponding to H(x) and y independent. We can then argue that the adversary doesn’t make any oracle query that is associated with two distinct queries of $\mathcal {P} _2$, so the reasoning of our main theorem also applies in this case. Hence, $\mathcal {P} _2$ demonstrates that our main characterization is different for Linicrypt programs with non-distinct nonces.

1.3 B.3 Random Oracle vs Ideal Cipher

A natural application of collision resistance would be the constructions of collision-resistant hash functions from an ideal cipher [3, 12]. It should be possible to use Linicrypt to reason about constructions in the ideal cipher model, although it would require non-trivial modifications. We could interpret E(k, m) as $H(\texttt {E},k,m)$ and D(k, c) as $H(\texttt {D},k,c)$. The constraint that $D(k,E(k,m))\,=\,m$ adds some extra structure that must be reflected in the algebraic representation. For example, if a program $\mathcal {P} $ makes a query $c=E(k,m)$, we must consider the adversary’s ability to make this forward query but also its ability to make the corresponding backwards query D(k, c). Both forward/backwards queries must be considered before deeming the pair of queries E(k, m) and D(k, c) unreachable.

We do not foresee the transition to ideal cipher model to be particularly problematic. However, the specific analysis of [3] shows several constructions of hash functions from ideal ciphers where the round functions are not collision-resistant, and yet their use in a Merkle-Damgård construction gives a collision-resistant result. So far, the theory of Linicrypt is not developed enough to reason about programs with looping constructs, as in an iterated hash function (despite the fact that such reasoning happens to be tractable for the specific example in Sect. 4).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McQuoid, I., Swope, T., Rosulek, M. (2019). Characterizing Collision and Second-Preimage Resistance in Linicrypt. In: Hofheinz, D., Rosen, A. (eds) Theory of Cryptography. TCC 2019. Lecture Notes in Computer Science(), vol 11891. Springer, Cham. https://doi.org/10.1007/978-3-030-36030-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-36030-6_18
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36029-0
Online ISBN: 978-3-030-36030-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the International Association for Cryptologic Research (opens in a new tab)

Characterizing Collision and Second-Preimage Resistance in Linicrypt

Abstract

Similar content being viewed by others

Linicrypt in the Ideal Cipher Model

From Indifferentiability to Constructive Cryptography (and Back)

Tight Security of Cascaded LRW2

1 Introduction

Theorem 1

1.1 Related Work and Comparison

2 Preliminaries

2.1 Linicrypt

2.2 Security Definitions

Definition 2

Definition 3

3 Characterizing Collision-Resistance in Linicrypt

3.1 Easy Case: Degeneracy

Definition 4

Lemma 5

Proof

3.2 Running Example: An Interesting Second-Preimage Attack

3.3 Collision Structures for Finding Second Preimages

Definition 6

Lemma 7

Proof

3.4 Efficiently Finding Collision Structures

Lemma 8

3.5 Breaking Collision Resistance Implies Collision Structure

Definition 9

Lemma 10

Proof

Claim

3.6 Putting Everything Together

Theorem 11

Corollary 12

Proof

4 A Simple Application

5 Extensions, Limitations, Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Proofs

Proof

B Extensions, Limitations, Future Work

1.1 B.1 Generalizing to Higher Arity

1.2 B.2 Why the Restriction to Distinct Nonces?

1.3 B.3 Random Oracle vs Ideal Cipher

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation