1 Introduction

Over the last few years, there has been an increasing interest in quantum cryptography. On one hand, many cryptographic schemes based on quantum information have been proposed, among which the most well-known result is quantum key distribution (OKD) [1]. These schemes take full advantage of the novel properties of quantum information and aims to realize functionalities that do not exist using classical information alone. On the other hand, the development of quantum computing threatens many classical cryptosystems. The most representative example is Shor’s algorithm [22]. By using Shor’s algorithm, an adversary who owns a quantum computer can break the security of any schemes based on factorization or discrete logarithm, such as RSA. This has greatly motivated the development of post-quantum cryptography, i.e., classical cryptosystems that remain secure even when the adversary owns a quantum computer.

While currently used public-key cryptography suffers from a severe threat due to Shor’s algorithm, the impact of quantum computers on symmetric-key cryptography is still less understood. Since Grover’s algorithm provides a quadratic speed-up for general search problems, the key lengths of symmetric-key cryptosystems need to be doubled to maintain the security. In addition, Simon’s algorithm [23] has also been applied to cryptanalysis. Kuwakado and Morii use it to construct a quantum distinguisher for 3-round Feistel scheme [14] and recover partial key of Even–Mansour construction [15]. Santoli and Schaffiner extend their result and present a quantum forgery attack on CBC-MAC scheme [21]. In [12], Kaplan et al. use Simon’s algorithm to attack various symmetric cryptosystems, such as CBC-MAC, PMAC, CLOC and so on. They also study how differential and linear cryptanalysis behave in the post-quantum world [11]. In addition to Simon’s algorithm, Bernstein–Vazirani (BV) algorithm [2] has also been used for cryptanalysis. Li and Yang proposed two methods to execute quantum differential cryptanalysis based on BV algorithm in [16], but in their attack, it is implicitly assumed that the attacker can query the function which maps the plaintext to the input of the last round of the encryption algorithm.

In this paper, we study applications of BV algorithm and use it to attack block ciphers. It has been found that running BV algorithm on a Boolean function f without performing the final measurement will gives a superposition of all states \(|\omega \rangle \) (\(\omega \in \{0,1\}^n\)), and the amplitude corresponding to each \(|\omega \rangle \) is its Walsh spectrum \(S_f(\omega )\) [8, 10]. In addition, there is a link between the linear structure of a Boolean function and its Walsh spectrum [6]. Based on these two facts, Li and Yang present a quantum algorithm to find the linear structures of a Boolean function in [17]. We modify their algorithm so that it can find the linear structures of a vector function. Our attack strategies are all built on this modified algorithm.

Attack model In this paper, we only consider quantum chosen message attack that has been studied in [3, 5, 9]. In this attack model, the adversary is granted the access to a quantum oracle which computes the encryption function in superposition. Specifically, if the encryption algorithm is described by a classical function \(E_k:\{0,1\}^n\rightarrow \{0,1\}^n\), then the adversary can make quantum queries \(\sum _{x,y}|x\rangle |y\rangle \rightarrow \sum _{x,y}|x\rangle |y\oplus E_k(x)\rangle \).

Our contributions In this article, we present several methods to attack block ciphers. We first propose a quantum algorithm for finding the linear structures of a vector function, which takes BV algorithm as a subroutine and is developed from the algorithm in [17]. Then we modify this original algorithm to get different versions and apply them in different attack strategies. In more detail, our main contributions are as follows:

  • We construct new quantum distinguishers for the 3-round Feistel scheme and propose a new quantum algorithm to recover partial key of Even–Mansour construction. Our methods are similar with the ones proposed by Kuwakado and Morii [14, 15], but we use BV algorithm instead of Simon’s algorithm. Although this modification cause a slight increase in complexity, it makes our methods has more general applications. For example, by constructing functions that have different linear structures, we can obtain various distinguishers for the 3-round Feistel scheme. The essential reason is that using BV algorithm can find not only the periods of a function but also its other linear structures.

  • Observing that linear structures of a encryption function are actually high probability differentials of it, we propose three ways to execute differential cryptanalysis, which we call quantum differential analysis, quantum small probability differential cryptanalysis and quantum impossible differential cryptanalysis respectively. Afterwards, we analyze the efficiency and success probability of all attacks. The quantum algorithms used for these three kinds of differential cryptanalysis all have polynomial running time. As we know, one of the main shortcomings of traditional differential cryptanalysis is the difficulties in extending the differential paths, which limits the number of rounds that can be attacked. Our approach avoids this problem since it treats the encryption function as a whole.

2 Preliminaries

In this section, we briefly recall a few notations and results about the linear structure. Let n be a positive integer and \(\mathbb {F}_2=\{0,1\}\) be a finite field of characteristic 2. \(\mathbb {F}_2^n=\{0,1\}^n\) is a vector space over \(\mathbb {F}_2\). The set of all functions from \(\mathbb {F}_2^n\) to \(\mathbb {F}_2\) is denoted as \(\mathcal {B}_n\).

Definition 1

A vector \(a\in \mathbb {F}_2^n\) is called a linear structure of a Boolean function \(f\in \mathcal {B}_n\), if

$$\begin{aligned} f(x\oplus a)+f(x)=f(a)+f(0),\,\,\,\forall x\in \mathbb {F}_2^n, \end{aligned}$$

where \(\oplus \) denotes the bitwise exclusive-or.

For any \(f\in \mathcal {B}_n\), let \(U_f\) be the set of all linear structures of f, and

$$\begin{aligned} U_f^i=\left\{ a\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i,\forall x\in \mathbb {F}2^n\right\} ,\,\,\, i=0,1. \end{aligned}$$

Obviously, \(U_f=U_f^0\cup U_f^1\). For any \(a\in \mathbb {F}_2^n\) and \(i=0,1\), let

$$\begin{aligned} V_{f,a}^i=\{x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\}. \end{aligned}$$

Clearly, \(0\le |V_{f,a}^i|/2^n\le 1\), and \(a\in U^i_f\) if and only if \(|V_{f,a}^i|/2^n=1\). For any \(a\in \mathbb {F}_2^n\), \(1-|V_{f,a}^i|/2^n\) quantifies how close a is to be a linear structure of f. Naturally, we give the following definition:

Definition 2

A vector \(a\in \mathbb {F}_2^n\) is called a \(\sigma \)-close linear structure of a function \(f\in \mathcal {B}_n\), if there exists \(i\in \{0,1\}\) such that

$$\begin{aligned} 1-\frac{\left| \left\{ x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\right\} \right| }{2^n}<\sigma . \end{aligned}$$

If a is a \(\sigma (n)\)-close linear structure of f for some negligible function \(\sigma (n)\), we call it a quasi linear structure of f.

Relative differential uniformity of a Boolean function quantifies how close the function is from having a nontrivial linear structure, which is defined in [20]:

Definition 3

The relative differential uniformity of \(f\in \mathcal {B}_n\) is defined as

$$\begin{aligned} \delta _f=\frac{1}{2^n}\max _{0\ne a\in \mathbb {F}_2^n}\max _{i\in \mathbb {F}_2}\left| \left\{ x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\right\} \right| . \end{aligned}$$

For any \(f\in \mathcal {B}_n\), it is obviously that \(\frac{1}{2}\le \delta _f\le 1\), and \(U_f\ne \{0\}\) if and only if \(\delta _f=1\). The linear structures of a Boolean function is closely related to its Walsh spectrum, which is defined as follows:

Definition 4

The Walsh spectrum of a Boolean function \(f\in \mathcal {B}_n\) is defined as

$$\begin{aligned} S_f(\omega )=\frac{1}{2^n}\sum _{x\in \mathbb {F}_2^n}(-1)^{f(x)+\omega \cdot x}. \end{aligned}$$

The relation between the linear structure and Walsh spectral is captured by following two lemmas, which have been proved in [19].

Lemma 1

Let \(f\in \mathcal {B}_n\), then \(\forall a\in \mathbb {F}_2^n\), \(\forall i\in \mathbb {F}_2\),

$$\begin{aligned} \sum _{\omega \cdot a=i}S^2_f(\omega )=\frac{|V^i_{f,a}|}{2^n}=\frac{\left| \left\{ x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\right\} \right| }{2^n}. \end{aligned}$$

Lemma 2

For \(f\in \mathcal {B}_n\), let \(N_f=\{\omega \in \mathbb {F}_2^n|S_f(\omega )\ne 0\}\). Then \(\forall i\in \{0,1\}\), it holds that

$$\begin{aligned} U_f^i=\{a\in \mathbb {F}_2^n|\omega \cdot a=i,\forall \,\, \omega \in N_f\}. \end{aligned}$$

Lemma 2 provides a method to find the linear structures. If we have a sufficiently large subset H of \(N_f\), we can get \(U_f^i\) by solving the system of linear equations \(\{a\cdot \omega =i|\omega \in H\}\). (Here, solving the system of linear equations \(\{a\cdot \omega =i|\omega \in H\}\) means finding vector a that satisfies \(a\cdot \omega =i\) for all \(\omega \in H\).)

Next we consider the vector functions. Suppose mn are positive integers. \(\mathcal {C}_{m,n}\) denotes the set of all functions from \(\mathbb {F}_2^m\) to \(\mathbb {F}_2^n\). The linear structure of a vector function in \(\mathcal {C}_{m,n}\) can be defined similarly:

Definition 5

A vector \(a\in \mathbb {F}_2^m\) is called a linear structure of a vector function \(F\in \mathcal {C}_{m,n}\), if there exists a vector \(\alpha \in \{0,1\}^n\) such that

$$\begin{aligned} F(x\oplus a)\oplus F(x)=\alpha , \,\,\,\forall x\in \{0,1\}^m. \end{aligned}$$

Suppose \(F=(F_1,F_2,\ldots ,F_n)\). A straightforward way to find the linear structures of F is to first search for the linear structures of each component function \(F_j\) respectively and then take the intersection. Let \(U_F\) be the set of all linear structures of F, and \(U_F^{\alpha }=\{a\in \mathbb {F}_2^n|F(x\oplus a)\oplus F(x)=\alpha , \,\,\forall x\}\). It is obviously that \(U_F=\cup _{\alpha }U_F^{\alpha }\). The relative differential uniformity of F is defined as

$$\begin{aligned} \delta _F=\frac{1}{2^m}\max _{0\ne a\in \mathbb {F}_2^m}\max _{\alpha \in \mathbb {F}_2^n}|\{x\in \mathbb {F}_2^m|F(x\oplus a)\oplus F(x)=\alpha \}|, \end{aligned}$$

which quantifies how close F is from having a nontrivial linear structure.

3 Finding linear structures via Bernstein–Vazirani algorithm

In this section we briefly recall the BV algorithm [2] and introduce how to use it to find the linear structures of a Boolean function. The goal of BV algorithm is to determine a secret string \(a\in \{0,1\}^n\). Specifically, suppose

$$\begin{aligned} f(x)=a\cdot x=\sum _{i=1}^na_ix_i. \end{aligned}$$

The algorithm aims to determine a, given the access to an quantum oracle which computes the function f. It works as follows:

  1. 1.

    Prepare the initial state \(|\psi _0\rangle =|0\rangle ^{\otimes n}|1\rangle \), then perform the Hadamard transform \(H^{(n+1)}\) on it to obtain the quantum superposition

    $$\begin{aligned} |\psi _1\rangle =\sum _{x\in \mathbb {F}_2^n}\frac{|x\rangle }{\sqrt{2^n}}\cdot \frac{|0\rangle -|1\rangle }{\sqrt{2}}. \end{aligned}$$
  2. 2.

    A quantum query to the oracle which computes f maps it to the state

    $$\begin{aligned} |\psi _2\rangle =\sum _{x\in \mathbb {F}_2^n}\frac{(-1)^{f(x)}|x\rangle }{\sqrt{2^n}}\cdot \frac{|0\rangle -|1\rangle }{\sqrt{2}}. \end{aligned}$$
  3. 3.

    Apply the Hadamard gates \(H^{(n)}\) to the first n qubits again yielding

    $$\begin{aligned} |\psi _3\rangle =\sum _{y\in \mathbb {F}_2^n}\frac{1}{2^n}\sum _{x\in \mathbb {F}_2^n}(-1)^{f(x)+y\cdot x}|y\rangle , \end{aligned}$$

    where we omit the last qubit for the simplicity. If \(f(x)=a\cdot x\), we have

    $$\begin{aligned} |\psi _3\rangle&=\sum _{y\in \mathbb {F}_2^n}\left( \frac{1}{2^n}\sum _{x\in \mathbb {F}_2^n}(-1)^{(a\oplus y)\cdot x}\right) |y\rangle \\&=\sum _{y\in \mathbb {F}_2^n}\delta _{a}(y)|y\rangle \\&=|a\rangle , \end{aligned}$$

    where \(\delta _a(y)=1\) if \(y=a\), otherwise \(\delta _a(y)=0\). Then by measuring \(|\psi _3\rangle \) in the computational basis, we will get a with probability 1.

If we run the BV algorithm on a general function \(f\in \mathcal {B}_n\), the output before the measurement can be expressed as

$$\begin{aligned} |\psi _3\rangle =\sum _{y\in \mathbb {F}_2^n}S_f(y)|y\rangle , \end{aligned}$$

where \(S_f(\cdot )\) is the Walsh spectrum of f. When we measure the above state in the computational basis, we will obtain y with probability \(S_f(y)^2\). In other words, we will always get \(y\in N_f\) when we run the BV algorithm on f. This fact combined with Lemma 2 implies a way to find the linear structures.

Now we state the quantum algorithm proposed in [17] for finding the linear structures of a Boolean function. Roughly speaking, the BV algorithm is treated as a subroutine. By repeating the subroutine until one gets a subset H of \(N_f\), and then solving the system of linear equations \(\{x\cdot \omega =i|\omega \in H\}\) for both \(i=0\) and 1, one will get candidate linear structures of f.

figure a

To justify the validity of the above algorithm, we present following two theorems, where the Theorem 1 is proved in [17] and the Theorem 2 hasn’t been proved before.

Theorem 1

If running Algorithm 1 on a function \(f\in \mathcal {B}_n\) gives sets \(A^0\) and \(A^1\), then for all \(a\in A^i\) (\(i=0,1\)), all \(\epsilon \) satisfying \(0<\epsilon <1\), we have

$$\begin{aligned} Pr\left[ 1-\frac{|\{x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\}|}{2^n}<\epsilon \right] >1-\exp (-2p(n)\epsilon ^2). \end{aligned}$$
(1)

This theorem is proved in [17], and we present the proof in Appendix A for the paper to be self-contained.

For arbitrary function \(f\in \mathcal {B}_n\), we let

$$\begin{aligned} \delta '_f=\frac{1}{2^n} \max _{\begin{array}{c} a\in \mathbb {F}_2^n\\ a\notin U_f \end{array}}\max _{i\in \mathbb {F}_2}|\{x\in \mathbb {F}_2^n|f(x\oplus a)+f(x)=i\}|. \end{aligned}$$

It is obviously that \(\delta '_f<1\). And if \(\delta _f<1\), it holds that \(\delta '_f=\delta _f\). By the definition of \(\delta '_f\), we can see that the smaller \(\delta '_f\) is, the better for ruling out the vectors which are not linear structures of f during executing Algorithm 1.

Theorem 2

Suppose \(\delta '_f\le p_0<1\) and Algorithm 1 makes quantum queries for \(p(n)=cn\) times. Then it holds that

  1. 1.

    If \(\delta _f<1\), that is, f has no nonzero linear structure, then Algorithm 1 returns “No” with probability greater than \(1-p_0^{cn}\);

  2. 2.

    If Algorithm 1 returns \(A^0\) and \(A^1\), then for any \(a\notin U_f^i\) (\(i=0,1\)),

    $$\begin{aligned} Pr[a\in A^i]\le p_0^{cn}. \end{aligned}$$

Proof

We first prove the second conclusion. Without loss of generality, we suppose \(i=0\). The case where \(i=1\) can be proved by similar way. If \(a\notin U_f^0\), then according to Lemma 2 there exists a vector \(\omega \in N_f\) such that \(\omega \cdot a=1\). Let \(K=\{\omega \in N_f|\omega \cdot a=1\}\). If the cn times of running BV algorithm ever gives a vector \(\omega \in K\), then \(a\notin A^0\). Let W denote the random variable obtained by running BV algorithm, then

$$\begin{aligned} Pr[W\in K]&=\sum _{a\cdot \omega =1}S_f(\omega )^2\\&=1-\frac{|V_{f,a}^0|}{2^n}\\&\ge 1-p_0. \end{aligned}$$

The second formula holds due to Lemma 1. Therefore,

$$\begin{aligned} P[a\in A^0]&=[1-P(W\in K)]^{cn}\\&\le p_0^{cn}, \end{aligned}$$

which completes the proof of the second conclusion. By observing that \(\delta _f<1\) means there is no nonzero vector in \(U_f\), the first conclusion can be naturally derived from the second one. \(\square \)

Suppose l(n) is an arbitrary polynomial of n. Theorem 1 implies that after \(p(n)=O(l(n)^2n)\) queries, all vectors in \(A^0\) and \(A^1\) will be \(\frac{1}{l(n)}\)-close linear structures of f except a negligible probability. In other words, Algorithm 1 is very likely to output the high probability differentials of f. Theorem 2 shows that if f has no linear structure, Algorithm 1 with O(n) queries will output “No” except a negligible probability. In addition, if Algorithm 1 returns sets \(A^0\) and \(A^1\), then each vector in \(A^i\) will be linear structure of f with overwhelming probability. (The probability of a event happening is said to be overwhelming if it happens except a negligible probability.)

4 Linear structure attack

In this section, we first improve the Algorithm 1 so that it can find the linear structures of a vector function. Afterwards, we use the new algorithm to construct quantum distinguishers for the 3-round Feistel scheme and recover partial key of Even–Mansour construction respectively. Since our attack strategy is based on the linear structures of some constructed functions, we call it linear structure attack.

4.1 Attack algorithm

Suppose \(F=(F_1,F_2,\ldots ,F_n)\in \mathcal {C}_{m,n}\). A straightforward way to find the linear structures of F is to apply Algorithm 1 to each component function \(F_j\) respectively and then choose a public linear structure. Specifically, we have following algorithm:

figure b

For any function \(F=(F_1,\ldots ,F_n)\), let \(\delta '_F=\max _{1\le j\le n}\delta '_{F_j}\). The following theorem justifies the validity of the Algorithm 2.

Theorem 3

Suppose \(F\in \mathcal {C}_{m,n}\). Running Algorithm 2 with \(cn^2\) queries (\(p(n)=cn\)) on F gives “No” or some vector. It holds that

  1. 1.

    If \(\delta '_F\le p_0<1\) and F has no linear structure, then Algorithm 2 returns “No” with probability greater than \(1-p_0^{cn}\).

  2. 2.

    If \(\delta '_{F}\le p_0<1\), then for any \(a\notin U_F^{(i_1,\ldots ,i_n)}\), we have

    $$\begin{aligned} Pr[\text {Algorithm 2 returns } (a,i_1,\ldots ,i_n)]\le p_0^{cn}. \end{aligned}$$
  3. 3.

    If \((a,i_1,\ldots ,i_n)\) is obtained by running Algorithm 2, then for any \(0<\epsilon <1\),

    $$\begin{aligned} Pr\left[ \frac{\left| \left\{ x\in \mathbb {F}_2^m|F(x\oplus a)\oplus F(x)=i_1\ldots i_n\right\} \right| }{2^m}>1-n\epsilon \right] >\big (1-\exp (-2p(n){\epsilon }^2)\big )^n. \end{aligned}$$
    (2)

Proof

By observing the fact that \(a\in U_F^{(i_1,\ldots ,i_n)}\) if and only if for all \(j=1,\ldots ,n\), \(a\in U_{F_j}^{i_j}\), the first two conclusion can be naturally derived from the Theorem 2. According to Theorem 1, we have

$$\begin{aligned} \frac{\left| \left\{ x\in \mathbb {F}_2^m|F_j(x\oplus a)\oplus F_j(x)=i_j\right\} \right| }{2^m}>1-\epsilon ,\,\,\, \forall j=1,\ldots ,n \end{aligned}$$
(3)

holds with a probability greater than \((1-\exp (-2p(n){\epsilon }^2))^n\). If Eq. (3) holds, then the number of x satisfying

$$\begin{aligned} F_j(x\oplus a)+F_j(x)=i_j \end{aligned}$$
(4)

for both \(j=1\) and \(j=2\) is at least \(2^m[2(1-\epsilon )-1]=2^m(1-2\epsilon )\). Similarly, the number of x satisfying Eq. (4) for all \(j=1,2,3\) is at least \(2^m[(1-2\epsilon )+(1-\epsilon )-1]=2^m(1-3\epsilon )\). By induction, the number of x satisfying (4) for all \(j=1,\ldots ,n\) is at least \(2^m(1-n\epsilon )\). Thus the probability that

$$\begin{aligned} \frac{\left| \left\{ x\in \mathbb {F}_2^m|F(x\oplus a)\oplus F(x)=i_1\ldots i_n\right\} \right| }{2^m}>1-n\epsilon \end{aligned}$$

holds is greater than \((1-\exp (-2p(n){\epsilon }^2))^n\). Thus the third conclusion holds. \(\square \)

Note that Algorithm 2 actually requires that the adversary has the oracle access to each component function of F. About the efficiency, since Algorithm 2 needs to find the intersection of the sets \(A_j\)’s, its complexity depends on the size of these sets, which relies on the properties of the specific function F. However, we can prove that only polynomial time of computation is needed when Algorithm 2 is applied to 3-round Feistel scheme or Even–Mansour construction. In [12, 14, 15, 21], the authors use Simon’s algorithm to find the period of some constructed functions and then break the security of 3-round Feistel scheme or Even–Mansour construction. Compared with Simon’s algorithm, the complexity of Algorithm 2 is a little larger because it needs to search linear structures of each component function respectively. However, Algorithm 2 has more general applications. It can find not only the periods of a function but also its other linear structures, which allows us to construct multiple distinguishers for 3-round Feistel scheme. And in Sect. 5 we will see that such way of finding linear structures by considering each component function respectively may bring some unexpected advantages for differential cryptanalysis.

4.2 Application to a three-round Feistel scheme

A Feistel scheme is a classical construction to build block ciphers. A 3-round Feistel scheme with input \((x_L,x_R)\) and output \((y_L,y_R)\) is built from three random functions \(P_1,P_2,P_3\) as shown in Fig. 1, where \(x_L,x_R,y_L,y_R\in \{0,1\}^n\). It’s proved that a 3-round Feistel scheme is a secure pseudorandom permutation as long as the internal functions are pseudorandom as well [18]. Our goal is to construct a quantum distinguisher which distinguishes a 3-round Feistel scheme from a random permutation on \(\{0,1\}^{2n}\).

Fig. 1
figure 1

Three-round Feistel scheme

Suppose \(s_0,s_1\in \mathbb {F}_2^n\) are two arbitrary constants such that \(s_0\ne s_1\). We define the following function:

$$\begin{aligned} F: \mathbb {F}_2\times \mathbb {F}_2^n&\rightarrow \mathbb {F}_2^n\nonumber \\ (\,\,\,b\,\,,\,x\,\,)&\rightarrow P_2(x\oplus P_1(s_b)). \end{aligned}$$
(5)

Given the oracle access of the 3-round Feistel function E, it is easy to construct the oracle \(O_F\) which computes F on superpositions. Observing that the right part of the output \(E(s_b,x)\) is \(F(b,x)\oplus s_b\), we can construct the oracle \(O_F\) by first querying the oracle which computes the right part of E, then applying the unitary operator: \(U^{s_0,s_1}:|b\rangle |c\rangle \rightarrow |b\rangle |c\oplus s_b\rangle \). For every \((b\Vert x)\in \mathbb {F}_2^{n+1}\), it is easy to check that

$$\begin{aligned} F(b,x)=F(b\oplus 1,x\oplus P_1(s_0)\oplus P_1(s_1)). \end{aligned}$$

Thus \((1\Vert s)\triangleq (1\Vert P_1(s_0)\oplus P_1(s_1))\) is a nonzero linear structure of F, or more accurately, \((1\Vert s)\in U_F^{(0,\ldots ,0)}\). Therefore, by running Algorithm 2 on F one can get \((1\Vert s)\). On the other hand, the probability of a random function having a linear structure is negligible. Given the access to a quantum oracle which computes the 3-rounds Feistel function E or a random permutation over \(\{0,1\}^{2n}\), we can construct the distinguishing algorithm as below:

figure c

Before analyzing the validity and efficiency of Algorithm 3, we first give following lemma:

Lemma 3

Suppose \(F=(F_1,\ldots ,F_{n})\) is defined as Eq. (5). Then for all \(j=1,2,\ldots ,n\),

$$\begin{aligned} \delta _{F_j}(1\Vert s)\,\,\,\triangleq \frac{1}{2^{n+1}}\max _{\begin{array}{c} (\tau ,t)\in \mathbb {F}_2^{n+1}\\ (\tau ,t)\notin \{(0,\ldots ,0),(1\Vert s)\} \end{array}}\,\,\,\left| \left\{ (b,x)\in \mathbb {F}_2^{n+1}|F_j(b,x)=F_j(b\oplus \tau ,x\oplus t)\right\} \right| \le \frac{2}{3} \end{aligned}$$

holds except a negligible probability. Here, \(\delta _{F_j}(1\Vert s)\) is still a random variable since \(F_j\) is determined by random functions \(P_1, P_2\).

Proof

If \(\delta _{F_j}(1\Vert s)>\frac{2}{3}\), then there exists \((\tau ,t)\notin \{0,(1\Vert s)\}\) such that \(|\{(b,x)\in \mathbb {F}_2^{n+1}|F_j(b,x)=F_j(b\oplus \tau ,x\oplus t)\}|>\frac{2}{3}\cdot 2^{n+1}\). If \(\tau =0\), this implies

$$\begin{aligned} \left| \left\{ x\in \mathbb {F}_2^n|F_j(0,x)=F_j(0,x\oplus t)\right\} \right|>\frac{2}{3}\cdot 2^n\text { or }\left| \left\{ x\in \mathbb {F}_2^n|F_j(1,x)=F_j(1,x\oplus t)\right\} \right| >\frac{2}{3}\cdot 2^n. \end{aligned}$$

Thus there exists some b such that

$$\begin{aligned} \left| \left\{ x\in \mathbb {F}_2^n|P_{2j}(x\oplus P_1(s_b))=P_{2j}(x\oplus t\oplus P_1(s_b))\right\} \right| >\frac{2}{3}\cdot 2^n, \end{aligned}$$

where \(P_{2j}\) is the \(j^{th}\) component function of \(P_2\). That is,

$$\begin{aligned} \left| \left\{ x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus t)\right\} \right| >\frac{2}{3}\cdot 2^n. \end{aligned}$$

Similarly, if \(\tau =1\), we have

$$\begin{aligned} \left| \left\{ x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus t\oplus P_1(s_0)\oplus P_1(s_1))\right\} \right| >\frac{2}{3}\cdot 2^n. \end{aligned}$$

Anyway, there exists a \(u\ne (0,\ldots ,0)\) such that

$$\begin{aligned} \left| \left\{ x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus u)\right\} \right| >\frac{2}{3}\cdot 2^n. \end{aligned}$$
(6)

Since \(P_{2}\) is a random function, \(P_{2j}\) is actually a random Boolean function. Thus, \(|\{x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus u)\}|\) is still a random variable. We next prove that equation (6) holds only with a negligible probability. This will imply that the probability of \(\delta _{F_j}(1\Vert s)>\frac{2}{3}\) is negligible, and thus completes the proof.

The rest is to prove the probability

$$\begin{aligned} Pr\left[ \frac{\left| \left\{ x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus u)\right\} \right| }{2^n}>\frac{2}{3}\right] \end{aligned}$$

is negligible, where \(P_{2j}\) is uniformly chosen from the set of all Boolean functions from \(\mathbb {F}_2^n\) to \(\mathbb {F}_2\). We call an unordered set of two vectors in \(\mathbb {F}_2^n\) a pair. For example, vectors xy form a pair \(\{x,y\}\), and \(\{x,y\}=\{y,x\}\). The difference of a pair \(\{x,y\}\) is defined as \(x\oplus y\). For the nonzero difference u, there are \(2^{n-1}\) pairs with this difference. Thus,

$$\begin{aligned} \big |\big \{x\in \mathbb {F}_2^n|P_{2j}(x)&=P_{2j}(x\oplus u)\big \}\big |\nonumber \\&=\big |\big \{(x,y)\in \mathbb {F}_2^n\times \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(y),x\oplus y=u\big \}\big |\nonumber \\&=2\big |\big \{\{x,y\}|P_{2j}(x)=P_{2j}(y),x\oplus y=u\big \}\big |. \end{aligned}$$
(7)

For convenience, we denote the \(2^{n-1}\) pairs with difference u as \(\{x_1,x_1\oplus u\},\{x_2,x_2\oplus u\},\ldots ,\{x_{2^{n-1}},x_{2^{n-1}}\oplus u\}\). Let \(Z_l=P_{2j}(x_l)\oplus P_{2j}(x_l\oplus u)\oplus 1\) for \(l=1,2,\ldots , 2^{n-1}\). Since \(P_{2j}\) is random Boolean function and for \(l\ne k\), \(\{x_l,x_l\oplus u\}\bigcap \{x_k,x_k\oplus u\}=\varPhi \), \(Z_1,\ldots , Z_{2^{n-1}}\) are independent and identically distributed random variables. They all follow the uniform distribution over \(\{0,1\}\). That is, \(Pr[Z_l=0]=Pr[Z_l=1]=\frac{1}{2}\), \(l=1,2,\ldots ,2^{n-1}\). According to Hoeffding’s inequality,

$$\begin{aligned} Pr\left[ \frac{1}{2^{n-1}}\sum _{l=1}^{2^{n-1}}Z_l-\frac{1}{2}\ge \frac{1}{6}\right] \le 2\exp \left( -\frac{1}{18}2^{n-1}\right) . \end{aligned}$$

That is,

$$\begin{aligned} Pr\left[ \sum _{l=1}^{2^{n-1}}Z_l\ge \frac{2}{3}\cdot 2^{n-1}\right] \le 2\exp \left( -\frac{1}{18}2^{n-1}\right) . \end{aligned}$$

Since

$$\begin{aligned} \sum _{l=1}^{2^{n-1}}Z_l=&\big |\big \{Z_l|Z_l=1\big \}\big |\\ =&\big |\big \{\,\{x_l,x_l\oplus u\}\,|P_{2j}(x_l)\oplus P_{2j}(x_l\oplus u)\oplus 1=1\big \}\big |\\ =&\big |\big \{\,\{x,y\}\,|P_{2j}(x)=P_{2j}(y),x\oplus y=u\big \}\big |, \end{aligned}$$

we have

$$\begin{aligned} Pr\Big [\,\big |\big \{\,\{x,y\}\,|P_{2j}(x)=P_{2j}(xy),x\oplus y=u\big \}\big |\ge \frac{2}{3}\cdot 2^{n-1}\Big ]\le 2\exp \Bigg (-\frac{1}{18}2^{n-1}\Bigg ). \end{aligned}$$

According to Eq. (7), we have

$$\begin{aligned} Pr\left[ \frac{|\{x\in \mathbb {F}_2^n|P_{2j}(x)=P_{2j}(x\oplus u)\}|}{2^n}>\frac{2}{3}\right] \le 2\exp \left( -\frac{1}{18}2^{n-1}\right) . \end{aligned}$$

Therefore, Eq. (6) holds with a negligible probability. \(\square \)

About the validity of Algorithm 3, we have following theorem:

Theorem 4

Algorithm 3 successfully distinguishes the 3-round Feistel function from a random permutation except a negligible probability.

Proof

If the given oracle computes a random permutation, the string a obtained during executing Algorithm 3 is random if exists. Hence the probability of F(u) being equal to \(F(u')\) is approximate to \(\frac{1}{2^n}\). On the other hand, if the given oracle computes the 3-round Feistel function, then

$$\begin{aligned} \delta _{F_j}(1\Vert s)\,\,\,\triangleq \frac{1}{2^{n+1}}\max _{\begin{array}{c} (\tau ,t)\in \mathbb {F}_2^{n+1}\\ (\tau ,t)\notin \{(0,\ldots ,0),(1\Vert s)\} \end{array}}\,\,\,|\{(b,x)\in \mathbb {F}_2^{n+1}|F_j(b,x)=F_j(b\oplus \tau ,x\oplus t)\}|\le \frac{2}{3} \end{aligned}$$

holds with a overwhelming probability according to Lemma 3. Due to Theorem 2, above equation indicates

$$\begin{aligned} Pr[a\ne (1\Vert s)]\le \left( \frac{2}{3}\right) ^{n+1}. \end{aligned}$$

Thus the probability that \(F(u)\ne F(u')\) is no more than \((\frac{2}{3})^{n+1}\), which completes the proof. \(\square \)

Note that Algorithm 3 actually requires that the attacker can query each component function of the right part of E. Then we consider the efficiency of Algorithm 3. If the given oracle computes the 3-round Feistel function, according to Lemma 3 and Theorem 2, for any \(a\notin \{(0,\ldots ,0),(1\Vert s)\}\), we have \(Pr[a\in A_j^0]\le (\frac{2}{3})^{n+1}\). Thus with a overwhelming probability, \(A_j^0\) contains only 0 and \((1\Vert s)\). Therefore, finding the intersection of \(A_j^0\,'s\) almost needs no calculation. It is also true when given oracle computes a random permutation. In addition, Algorithm 3 queries quantum oracle for \(n(n+1)\) times and classical oracle for 2 times. Thus the complexity of Algorithm 3 is \(O(n^2)\). The distinguishing algorithm used in [12, 14, 21] is based on Simon’s algorithm and its complexity is only O(n), which is better then ours. But our algorithm provides a new and inspirational approach to attack block ciphers. And by using our attack strategy, one can find more than one distinguisher. For example, we can also define \(F(b,x)=P_2(x\oplus P_1(s_b))\oplus (b,\ldots ,b)\). Then F has the linear structure \((1\Vert P_1(s_0)\oplus P_1(s_1))\in U_F^{(1,\ldots ,1)}\), and we can use it to construct another quantum distinguisher by similar way.

4.3 Application to the Even–Mansour construction

The Even–Mansour construction is a simple scheme which builds a block cipher from a public permutation [7]. Suppose \(P:\{0,1\}^n\rightarrow \{0,1\}^n\) is a permutation, the encryption function is defined as

$$\begin{aligned} E_{k_1,k_2}=P(x\oplus k_1)\oplus k_2, \end{aligned}$$

where \(k_1,k_2\) are the keys. Even and Mansour have proved that this construction is secure in the random permutation model up to \(2^{n/2}\) queries. However, Kuwakado and Morii proposed a quantum attack which could recover the key \(k_1\) based on Simon’s algorithm. Our attack strategy is similar with theirs, we use BV algorithm instead of the Simon’s algorithm.

In order to recover the key \(k_1\), we first define the following function:

$$\begin{aligned} F:\{0,1\}^n&\rightarrow \{0,1\}^n \\ x&\rightarrow E_{k_1,k_2}(x)\oplus P(x). \end{aligned}$$

Given the oracle access of \(E_{k_1k_2}(\cdot )\), it is easy to construct the oracle \(O_F\) which computes F on superpositions. Since \(F(x)\oplus F(x\oplus k_1)=0\) for all \(x\in \mathbb {F}_2^n\), \(k_1\) is a linear structure of F, or more accurately, \(k_1\in U_F^{(00\cdots 0)}\). Therefore, by running Algorithm 2 on F with minor modification we can obtain \(k_1\). Specifically, following algorithm can recover \(k_1\) with a overwhelming probability.

figure d

Theorem 5

Running Algorithm 4 with \(n^2\) (\(p(n)=n\)) queries on F gives the key \(k_1\) except a negligible probability.

Proof

By the similar proof of Lemma 3, we can obtain that for \(j=1,\ldots ,n\)

$$\begin{aligned} \delta _{F_j}(k_1)\triangleq \frac{1}{2^n}\max _{\begin{array}{c} \alpha \in \mathbb {F}_2^{n}\\ \alpha \notin \{(o,\ldots ,0),k_1\} \end{array}}|\{x\in \mathbb {F}_2^n|F_j(x)=F_j(x\oplus \alpha )\}|\le \frac{2}{3} \end{aligned}$$
(8)

holds except a negligible probability. Here, \(\delta _{F_j}(k_1)\) is still a random variable since \(F_j\) is determined by random functions \(P_1,P_2\). Equation (8) indicates that

$$\begin{aligned} \delta '_F=\max _j\delta _{F_j}'\le \frac{2}{3} \end{aligned}$$

holds except a negligible probability. Then according to Theorem 3, the probability that Algorithm 4 outputs \(k_1\) is greater than \(1-(\frac{2}{3})^n\), which completes the proof. \(\square \)

About the complexity of Algorithm 4, according to Eq. (8) and Theorem 2, the probability of \(A_j^0\) containing the vectors apart from \(k_1\) and 0 is negligible. Thus finding the intersection of \(A_j^0\)’s almost needs no calculation. In addition, Algorithm 4 needs to query quantum oracle for \(n^2\) times. Thus its complexity is \(O(n^2)\).

5 Differential cryptanalysis

In this section, we look at the linear structures from another view: the differentials of a encryption function. Based on this, we give three ways to execute differential cryptanalysis, which we call quantum differential cryptanalysis, quantum small probability differential cryptanalysis and quantum impossible differential cryptanalysis respectively. Unlike the classical differential cryptanalysis, the success probability of the first two methods is related to the key used for encryption algorithm. Specifically, suppose q(n) is an arbitrary polynomial. For the first two methods, we can execute the corresponding attack algorithms properly so that they work for at least \((1-\frac{1}{q(n)})\) of the keys in the key space. While the third method works for all keys in the key space.

5.1 Quantum differential cryptanalysis

Differential cryptanalysis is a chosen-plaintext attack. Suppose \(E:\{0,1\}^n\rightarrow \{0,1\}^n\) is the encryption function of a r-round block cipher. Let \(F_k\) be the function which maps the plaintext x to the input y of the last round, where k denotes the key of the first \(r-1\) rounds. Let \(F_k(x)=y\), \(F_k(x')=y'\), then \(\varDelta x=x\oplus x'\) and \(\varDelta y=y\oplus y'\) are called the input difference and output difference respectively. The pair \((\varDelta x,\varDelta y)\) is called a differential. Differential cryptanalysis is composed by two phases. In the first phase, the attacker tries to find a high probability differential of \(F_k\). In the second phase, according to the high probability differential that has been found, the attacker tests all possible candidate subkeys and then recover the key of the last round. Our algorithm is applied in the first phase, while a quantum algorithm is applied in the second phase in [24].

Intuitively, we can use Algorithm 2 to find the high probability differentials of \(F_k\). However, there exists a problem that the oracle access of \(F_k\) is not available. The attacker can only query the whole encryption function E. In classical differential cryptanalysis, the attacker analyzes the properties of the encryption algorithm and searches for the high probability differentials that is independent of the key, i.e. the differentials that always have high probability no matter what the key is. We try to apply the same idea to our attack. But unfortunately, we still haven’t found a way to obtain the key-independent high probability differentials of \(F_k\) using BV algorithm. However, we can modify Algorithm 2 to find the differentials that have high probability for the most of keys. To do this, we treat the key as a part of the input of the encryption function and run Algorithm 2 on this new function. Specifically, suppose m be the length of the key in the first \(r-1\) rounds and \(\mathcal {K}=\{0,1\}^m\) be corresponding key space. Define the following function

$$\begin{aligned} G:\{0,1\}^n\times \{0,1\}^m&\rightarrow \{0,1\}^n\\ (x,k)\quad&\rightarrow F_k(x). \end{aligned}$$

G is deterministic and known to the attacker. Thus the oracle access of G is available. (Actually, the oracle access of each \(G_j\) is available.) By executing Algorithm 2 on G, one is expected to obtain a high probability differential of G with overwhelming probability. But in order to make it also the differential of \(F_k\), the last m bits of the input difference, which corresponds to the difference of the key, needs to be zero. To do this, we modify the Algorithm 2 slightly as below:

figure e

By running Algorithm 5, one can find a differential of \(F_k\) that has high probability for the most of keys. Specifically, we have following theorem:

Theorem 6

Suppose q(n) is an arbitrary polynomial of n. If running Algorithm 5 with np(n) quantum queries on G gives a vector \((a,i_1,\ldots ,i_n)\), then there exist a subset \(\mathcal {K}'\subseteq \mathcal {K}\) such that \(|\mathcal {K}'|/|\mathcal {K}|\ge 1-\frac{1}{q(n)}\) and for all \(k\in \mathcal {K}'\), it holds that

$$\begin{aligned} Pr\left[ \frac{|\{x\in \mathbb {F}_2^n|F_k(x\oplus a)\oplus F_k(x)=i_1,\ldots ,i_n\}|}{2^n}>1-\epsilon \right] >\Bigg (1-\exp \Bigg (-\frac{2p(n)\epsilon ^2}{q(n)^2n^2}\Bigg )\Bigg )^n. \end{aligned}$$

Proof

Since \(a\cdot (\omega _1,\ldots ,\omega _n)=0\) indicates \((a\Vert 0,\ldots ,0)\cdot (\omega _1,\ldots ,\omega _{n+m})=0\), the vector \((a\Vert 0\ldots ,0)\) can be seen as an output when we execute Algorithm 2 on G. According to Theorem 3,

$$\begin{aligned} \frac{\left| \left\{ z\in \mathbb {F}_2^{n+m}|G(z\oplus (a\Vert 0,\ldots ,0)\oplus G(z)=i_1,\ldots ,i_n\right\} \right| }{2^{n+m}}>1-n\epsilon _0 \end{aligned}$$
(9)

holds with a probability greater than \((1-\exp (-2p(n)\epsilon _0^2))^n\). Let

$$\begin{aligned} V(k)=\frac{\left| \left\{ x\in \mathbb {F}_2^{n}|F_k(x\oplus a)\oplus F_k(x)=i_1,\ldots ,i_n\right\} \right| }{2^{n}}. \end{aligned}$$

Equation (9) indicates \(\mathbb {E}_k(V(k))>1-n\epsilon _0\), where \(\mathbb {E}_k(\cdot )\) means the expectation when the key k is chosen uniformly at random from \(\mathcal {K}\). Therefore, if Eq. (9) holds, for any polynomial q(n), we have

$$\begin{aligned} Pr_k[V(k)>1-q(n)n\epsilon _0]>1-\frac{1}{q(n)}. \end{aligned}$$

That is, for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\), it holds that \(V(k)>1-q(n)n\epsilon _0\). Let \(\mathcal {K}'\) be the set of these keys, then \(|\mathcal {K}'|/|\mathcal {K}|\ge 1-\frac{1}{q(n)}\), and for all \(k\in \mathcal {K}'\), it holds that

$$\begin{aligned} Pr\Big [V(k)>1-q(n)n\epsilon _0\Big ]>\big (1-\exp (-2p(n)\epsilon _0^2)\big )^n. \end{aligned}$$

The conclusion is obtained by letting \(\epsilon =q(n)n\epsilon _0\). \(\square \)

According to Theorem 6, if \(p(n)=O(n^3q(n)^2)\), then for any \(k\in \mathcal {K}_1\) and any constant c, \(V(k)>1-\frac{1}{c}\) holds except a negligible probability. For any constant \(c_1,c_2\), if \(p(n)=\frac{1}{2}c_1^2n^2q(n)^2\ln {(c_2n)}\), then for any \(k\in \mathcal {K}_1\), we have

$$\begin{aligned} Pr\left[ \frac{|\{x\in \mathbb {F}_2^{n}|F_k(x\oplus a)\oplus F_k(x)=i_1,\ldots ,i_n\}|}{2^{n}}>1-\frac{1}{c_1}\right] >1-\frac{1}{c_2}. \end{aligned}$$

When the attacker performs differential cryptanalysis, he or she first chooses constants \(c_1,c_2\), then executes Algorithm 5 with \(p(n)=\frac{1}{2}c_1^2n^2q(n)^2\ln {(c_2n)}\) to obtain a differential of \(F_k\). The obtained differential has high probability for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\). Afterwards, the attacker determines the subkey in the last round according to this high probability differential, which can be done as in classical differential cryptanalysis. To analyze the complexity of Algorithm 5, we divide it into two parts: running BV algorithm to obtain the sets \(A_j\,'s\); and finding the intersection of \(A_j\,'s\). In the first part, Algorithm 5 needs to run BV algorithm for \(np(n)=O(n^2q(n)^2\ln {n})\) times. Thus \(O(n^2q(n)^2\ln {n})\) quantum queries are needed. As for the second part, the corresponding complexity depends on the size of the sets \(A_j\,'s\). Suppose \(t=\max _j|A_j|\), then the complexity of finding the intersection by sort method is \(O(nt\log {t})\). The value of t depends on the property of the encryption algorithm. Generally speaking, t will not be large since a well constructed encryption algorithm usually does not have many linear structures. In addition, one can also choose a greater p(n) to decrease the value of t. Therefore, the complexity of Algorithm 5 is \(O(n^2q(n)^2\ln {n})\).

One of the advantages of our algorithm is that it can find the high probability differential directly. While in classical case, the attacker needs to analyze the partial structures of the encryption algorithm respectively and then seek the high probability differential characteristics, which may be much more complicated with the increase of the number of rounds.

5.2 Quantum small probability differential cryptanalysis

In this subsection we present a new way to execute differential cryptanalysis, which is called quantum small probability differential cryptanalysis. As shown in the previous sections, the way we find differentials of a vector function is to first search for the differentials of each component functions respectively, and then choose a public input difference and output the corresponding differential. Although this method will slightly increase the complexity of the attack algorithm, it may bring advantages in some applications. Quantum small probability differential cryptanalysis is such an example. Differential cryptanalysis using small-probability differentials was considered in [4, 13]. In differential analysis, the attacker needs to use the differentials with notable statistical properties to distinguish the block cipher from a random permutation, such as differential cryptanalysis and impossible differential cryptanalysis. As for the small probability differentials, since any differential of a random permutation only has a very small probability, the “small-probability” property of the entire differential cannot be directly used to distinguish the block cipher from a random permutation. However, if we consider each component function of the encryption function respectively, it will be possible to execute cryptanalysis based on small-probability differentials. Specifically, let \(F_k:\{0,1\}^n\rightarrow \{0,1\}^n\) be the function which maps the plaintext x to the input y of the last round of the encryption algorithm, and \((\varDelta x,\varDelta y)\) is a differential of \(F_k\) with small probability. For a random permutation \(P=(P_1,\ldots ,P_n)\), the differential \((\varDelta x,\varDelta y)\) appears with probability about \(\frac{1}{2^n}\). But for the component function \(P_j\), the probability of the differential \((\varDelta x,\varDelta y_j)\) appearing is \(\frac{1}{2}\), which is not small at all. Our attack strategy is based on this fact. The detailed procedure is as follows:

I. Finding small probability differential Let \(G(x,k)=F_k(x)\) as defined previously and \(\mathcal {K}\) denote the key space of the first \(r-1\) rounds. The oracle access of G is available. The attacker first chooses two polynomials q(n), l(n) of n, then run Algorithm 5 with \(np(n)=n^4l(n)^2q(n)^2\) ( \(p(n)=n^3l(n)^2q(n)^2\) ) queries on G to get an output \((a,i_1,\ldots ,i_n)\). Let \(b=(\bar{i_1},\ldots ,\bar{i_n})\), where \(\bar{i_j}=i_j\oplus 1\). Then (ab) is a small probability differential of \(F_k\) for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\).

II. Key recovering Suppose \(\mathcal {S}\) is the set of all possible subkey of the last round. For each \(s\in \mathcal {S}\), we set the corresponding counter \(C_s\) to be zero and do as follows: fix the input difference a, and make \(2l(n)^2\) classical queries on whole encryption function to get \(2l(n)^2\) ciphers. Then decrypt the last round to obtain \(l(n)^2\) output differences \(\varDelta y^{(1)},\ldots ,\varDelta y^{(l(n)^2)}\) of \(F_k\). Let \(\varDelta y^{(i)}=(\varDelta y_{1}^{(i)},\varDelta y_{2}^{(i)},\ldots ,\varDelta y_{n}^{(i)})\). For \(i=1,\ldots ,l(n)^2\) and \(j=1,\ldots ,n\), if \(\varDelta y_{j}^{(i)}=b_j\), let the counter \(C_s=C_s+1\). Afterwards, calculate the ratio \(\lambda _s=C_s/nl(n)^2\). The attacker chooses the key \(s\in \mathcal {S}\) which has the smallest ratio \(\lambda _s\) to be the subkey of the last round.

To justify that above attack procedure work for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\), we give following theorem:

Theorem 7

There exists a subset \(\mathcal {K}_1\subseteq \mathcal {K}\) such that:

(1):

\(|\mathcal {K}_1|/|\mathcal {K}|\ge 1-\frac{1}{q(n)}\);

(2):

If the key used for the first \(r-1\) rounds of the encryption algorithm is in \(\mathcal {K}_1\) and s is the right subkey of the last round, then the ratio \(\lambda _s\) obtained by the above procedure satisfies

$$\begin{aligned} Pr\left[ \,\, \lambda _s\ge \frac{1}{l(n)}\,\, \right] \le 3\exp (-n/2). \end{aligned}$$

Proof

According to Theorem 1 and the definition of G, for any \(j=1,\ldots ,n\),

$$\begin{aligned} \frac{\left| \left\{ z\in \mathbb {F}_2^{m+n}|G_j(z)+G_j(z\oplus (a\Vert 0))=b_j\right\} \right| }{2^{m+n}}\le \epsilon \end{aligned}$$
(10)

holds with a probability greater than \(1-\exp (-2p(n)\epsilon ^2)\). Similar to the proof of Theorem 6, we let

$$\begin{aligned} V_j(k)=\frac{\left| \left\{ x\in \mathbb {F}_2^n|F_{kj}(x\oplus a)+F_{kj}(x)=b_j\right\} \right| }{2^n}. \end{aligned}$$

Equation (10) indicates \(\mathbb {E}_k(V_j(k))\le \epsilon \), where \(\mathbb {E}_k(\cdot )\) means the expectation when the key k is chosen uniformly at random from \(\mathcal {K}\). Therefore, if Eq. (10) holds, we have \(Pr_k[V_j(k)\le nq(n)\epsilon ]\ge 1-\frac{1}{nq(n)}\). In other words, for each \(j\in \{1,\ldots ,n\}\), \(V_j(k)\le nq(n)\epsilon \) holds for at least \((1-\frac{1}{nq(n)})\) of keys in \(\mathcal {K}\). Then by similar analysis in the proof of Theorem 3, for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\), it holds that

$$\begin{aligned} V_j(k)\le nq(n)\epsilon , \quad \quad \forall j\in \{1,\ldots ,n\}. \end{aligned}$$

Let \(\mathcal {K}'\) be the set of these keys, then \(|\mathcal {K'}|/|\mathcal {K}|\ge 1-\frac{1}{q(n)}\), and for all \(k\in \mathcal {K}'\), it holds that

$$\begin{aligned} Pr\left[ \,\,\frac{|\{x\in \mathbb {F}_2^n|F_{kj}(x\oplus a)+F_{kj}(x)=b_j\}|}{2^n}\le nq(n)\epsilon \,\,\right] >1-\exp (-2p(n)\epsilon ^2) \end{aligned}$$

Let \(\epsilon =\frac{1}{2nl(n)q(n)}\). Noticing \(p(n)=n^3l(n)^2q(n)^2\), we have

$$\begin{aligned} Pr\left[ \,\,\frac{|\{x\in \mathbb {F}_2^n|F_{kj}(x\oplus a)+F_{kj}(x)=b_j\}|}{2^n}\le \frac{1}{2l(n)}\,\,\right] >1-\exp (-n/2). \end{aligned}$$
(11)

That is, for all \(j=1,\ldots ,n\),

$$\begin{aligned} Pr_x\Big [\,\,F_{kj}(x)+F_{kj}(x\oplus a)=b_j\,\,\Big ]\le \frac{1}{2l(n)} \end{aligned}$$
(12)

holds except a negligible probability. For \(i=1,2,\ldots ,l(n)^2\), \(j=1,\ldots ,n\), we define the random variable

$$\begin{aligned} Y(i,j)=\left\{ \begin{array}{cc} 1 &{} \quad \,\varDelta y_{j}^{(i)}=b_j;\\ &{}\\ 0 &{} \quad \,\varDelta y_{j}^{(i)}\ne b_j.\\ \end{array} \right. \end{aligned}$$

For every ij, Eq. (12) indicates \(\mathbb {E}_x(Y(i,j))\le \frac{1}{2l(n)}\) except a negligible probability. (Here \(\mathbb {E}_x\) means the expectation when output difference is obtained by choosing plaintext x uniformly at random. \(\mathbb {E}_x(Y(i,j))\) is still a random variable since it is a function of the vector a, which is a random variable output by Algorithm 5.) According to Hoeffding’s inequality and the fact that Eq. (12) holds except a probability \(\exp (-n/2)\), it holds that

$$\begin{aligned} Pr\left[ \frac{\sum _{i,j}Y(i,j)}{nl(n)^2}\ge \frac{1}{2l(n)}+\delta \right] \le 2\exp (-2nl(n)^2\delta ^2)+\exp (-n/2). \end{aligned}$$

Let \(\delta =\frac{1}{2l(n)}\). Noticing \(\sum _{i,j}Y(i,j)/nl(n)^2=\lambda _s\), we have

$$\begin{aligned} Pr[\lambda _s\ge \frac{1}{l(n)}]\le 3\exp (-n/2), \end{aligned}$$

which completes the proof. \(\square \)

In key recovering phase, the attacker computes \(l(n)^2\) output difference to get the ratio \(\lambda _s\) for every \(s\in \mathcal {S}\). If s is not the right key of the last round, the \(l(n)^2\) differentials \((a,\varDelta y^{(i)})\) can be seen as differentials of a random permutation. Then the probability of \(Y(i,j)=1\) is approximate to \(\frac{1}{2}\) for every ij. Therefore, the expectation of \(\lambda _s\) is approximate to \(\frac{1}{2}\). On the other hand, if s is the right key, the probability of \(\lambda _s\ge \frac{1}{l(n)}\) is negligible according to Theorem 7. This notable difference makes our attack strategy feasible for at least \((1-\frac{1}{q(n)})\) of keys in \(\mathcal {K}\). About the complexity of the attack procedure, there are \(n^4l(n)^2q(n)^2\) quantum queries and \(2l(n)^2\) classical queries needed in total.

The basic idea of quantum small probability differential cryptanalysis is similar to the idea of quantum differential cryptanalysis, that is, using some notable statistical difference to distinguish a encryption function from a random permutation. The main difference of these two methods is in key recovering phase. In quantum differential cryptanalysis, the attacker treats the differential as a whole and records the number of times it appears, while in quantum small probability differential cryptanalysis, the attacker considers every bit of the output differences respectively and records the number of times they appear.

5.3 Quantum impossible differential cryptanalysis

Impossible differential cryptanalysis is also a chosen-plaintext attack. Suppose \(F_k:\{0,1\}^n\rightarrow \{0,1\}^n\) and the key space \(\mathcal {K}\) are defined as before. A differential \((\varDelta x,\varDelta y)\) is called a impossible differential of \(F_k\) if it satisfies that

$$\begin{aligned} F_k(x\oplus \varDelta x)+F_k(x)\ne \varDelta y, \quad \forall x\in \mathbb {F}_2^{n}. \end{aligned}$$

Impossible differential cryptanalysis is composed by two phases. In the first phase, the attacker tries to find an impossible differential \((\varDelta x,\varDelta y)\) of \(F_k\). And in the second phase, the attacker uses the found impossible differential to sieve the subkey of the last round. Specifically, the attacker fixes the input difference \(\varDelta x\), and make classical queries on the whole encryption function to get a certain number of ciphers. Then for any possible key s of the last round, the attacker uses it to decrypt these ciphers and obtains corresponding output differences of \(F_k\). If \(\varDelta y\) appears among these output differences, then the attacker rules out s. Our algorithm is applied in the first phase.

Let \(G(x,k)=F_k(x)\) as defined previously. The oracle access of G is available. A algorithm to find the impossible differentials of \(F_k\) is as follows:

figure f

Suppose the attacker gets \((j,a,i_j)\) by running Algorithm 6 with \(p(n)=O(n)\). Let \(\delta '_G=\max _{1\le j\le n}\delta '_{G_j}\). If \(\delta '_G\le p_0<1\) for some constant \(p_0\), then according to Theorem 2, \((a,\times ,\ldots ,\times ,i_j,\times ,\ldots ,\times )\) will be an impossible differential of \(F_k\) except a negligible probability. Here \(``\times \)” means the corresponding bit can be either 0 or 1. Specifically, we have following theorem:

Theorem 8

If \(\delta '_G\le p_0<1\) and running Algorithm 6 with \(np(n)=n^2\) (\(p(n)=n\)) queries on G gives a vector \((j,a,i_j)\), then for any key \(k\in \mathcal {K}\) and any \(i_1,\ldots ,i_{j-1}\), \(i_{j+1},\ldots ,i_n\in \{0,1\}\), it holds that

$$\begin{aligned} F_k(x)\oplus F_k(x\oplus a)\ne (i_1,\ldots ,i_{j-1},i_j,i_{j+1},\ldots ,i_n),\quad \forall x\in \mathbb {F}_2^n \end{aligned}$$
(13)

except a negligible probability. That is, \((a,i_1,\ldots ,i_n)\) is a impossible differential of \(F_k\) except a negligible probability.

Proof

According to Theorem 2, we have \(Pr[a\in U_{G_j}^{\bar{i_j}}]>1-p_0^n\). Thus \(Pr[G_j(z)\oplus G_j(z\oplus (a\Vert 0))\ne i_j,\forall z\in \mathbb {F}_2^{m+n}]>1-p_0^n\). This indicates for all \(k\in \mathcal {K}\),

$$\begin{aligned} Pr[F_{kj}(x)\oplus F_{kj}(x\oplus a)\ne i_j,\forall x\in \mathbb {F}_2^n]>1-p_0^n. \end{aligned}$$

Since the probability that Eq. (13) holds is no less than the above probability, the conclusion holds. \(\square \)

From Theorem 8, we can see that by running Algorithm 6 with \(O(n^2)\) queries the attacker may find impossible differentials of \(F_k\). Unlike the other two kinds of differential cryptanalysis proposed in previous two subsections, the “impossibility” of the found differential holds for all keys in \(\mathcal {K}\). But Algorithm 6 can only find the impossible differentials whose “impossibility” concentrates on a certain bit. In other words, only when there exists some j such that \(F_{kj}\) has impossible differentials, can Algorithm 6 find impossible differentials of \(F_k\). Although our algorithm can only find such special impossible differentials, it still provide a new and inspirational approach for impossible differential cryptanalysis. In addition, one of the main shortcomings of traditional impossible differential cryptanalysis is the difficulties in extending the differential path, which limits the number of rounds that can be attacked. Our approach does not have this problem since it treats the first \(r-1\) rounds as a whole.

6 Discussion and conclusion

In this paper, we construct new quantum distinguishers for the 3-round Feistel scheme and propose a new quantum algorithm to recover partial key of Even–Mansour construction. Afterwards, by observing that the linear structures of a encryption function are actually high probability differentials of it, we propose three ways to execute differential cryptanalysis. The quantum algorithms used for these three kinds of differential cryptanalysis all have polynomial running time. We believe our work provides some helpful and inspirational methods for quantum cryptanalysis.

There are many directions for future work. First, is it possible to modify the algorithms used for quantum differential analysis and quantum small probability differential cryptanalysis so that they can work for all keys in the key space. Also, under the premise of not affecting the success probability, how to reduce the complexity of our attacks is worthy of further study. In addition, all algorithms proposed in this article find differentials of a vector function by first searching for the differentials of its component functions respectively. There may exist other ways that find differentials of a vector function directly.