1 Introduction

Stream ciphers are a class of symmetric-key cryptosystems. They commonly generate a key stream of arbitrary length from a secret key and initialization vector (iv), and a plaintext is encrypted by XORing with the key stream. Many stream ciphers consist of an initialization and key-stream generator. The secret key and iv are well mixed in the initialization, where a key stream is never output, and the mixed internal state is denoted as the initial state in this paper. After the initialization, the key-stream generator outputs the key stream while updating the internal state. The initialization of stream ciphers generally requires much processing time, but the key-stream generator is very efficient.

Fig. 1.
figure 1

Model of LFSR-based stream ciphers

LFSRs are often used in the design of stream ciphers, where the update function consists of one or more LFSRs and non-linear functions. Without loss of generality, the key-stream generator of LFSR-based stream ciphers can be represented as Fig. 1, where the binary noise \(e_t\) is generated by the non-linear function. LFSR-based stream ciphers share the feasibility to guarantee a long period in the key stream.

A (fast) correlation attack is an important attack against LFSR-based stream ciphers. The initial idea was introduced by Siegenthaler [1], and it exploits the bias of \(e_t\). We guess the initial state \(s^{(0)}=(s_0, s_1, \ldots , s_{n-1})\), compute \(s_t\) for \(t=n,n+1,\ldots ,N-1\), and XOR \(s_t\) with corresponding \(z_t\). If we guess the correct initial state, highly biased \(e_t\) is acquired. Otherwise, we assume that the XOR behaves at random. When we collect an N-bit key stream and the size of the LFSR is n, the simple algorithm requires a time complexity of \(N 2^n\).

Following up the correlation attack, many algorithms have been proposed to avoid the exhaustive search of the initial state, and they are called as “fast correlation attack.” The seminal work was proposed by Meier and Staffelbach [2], where the noise \(e_t\) is efficiently removed from \(z_t\) by using parity-check equations, and \(s_t\) is recovered. Several improvements of the original fast correlation attack have been proposed [3,4,5,6,7,8], but they have limitations such as the number of taps in the LFSR is significantly small or the bias of the noise is significantly high. Therefore, their applications are limited to experimental ciphers, and they have not been applied to modern concrete stream ciphers.

Another approach of the fast correlation attack is the so-called one-pass algorithm [9, 10], and it has been successfully applied to modern concrete stream ciphers [11,12,13]. Similarly to the original correlation attack, we guess the initial state and recover the correct one by using parity-check equations. To avoid exhaustive search over the initial state, several methods have been proposed to decrease the number of secret bits in the initial state involved by parity-check equations [14, 15]. In the most successful method, the number of involved secret bits decreases by XORing two different parity-check equations. Let \(e_t = \langle s^{(0)}, a_t \rangle \oplus z_t\) be the parity-check equation, where \(\langle s^{(0)}, a_t \rangle \) denotes an inner product between \(s^{(0)}\) and \(a_t\), and we assume that \(e_t\) is highly biased. Without loss of generality, we first detect a set of pairs \((j_1, j_2)\) such that the first \(\ell \) bits in \(a_{j_1} \oplus a_{j_2}\) are 0, where such a set of pairs is efficiently detected from the birthday paradox. Then, \(\langle s^{(0)}, a_{j_1} \oplus a_{j_2} \rangle \oplus z_{j_1} \oplus z_{j_2}\) is also highly biased, and the number of involved secret bits decreases from n to \(n-\ell \). Later, this method is generalized by the generalized birthday problem [16]. Moreover, an efficient algorithm was proposed to accelerate the one-pass algorithm [14]. They showed that the guess and evaluation procedure can be regarded as a Walsh-Hadamard transform, and the fast Walsh-Hadamard transform (FWHT) can be applied to accelerate the one-pass algorithm. While the naive algorithm for the correlation attack requires \(N 2^{n}\), the FWHT enables us to evaluate it with the time complexity of \(N + n 2^n\). When the number of involved bits decreases from n to \(n-\ell \), the time complexity also decreases to \(N + (n-\ell )2^{n-\ell }\). The drawback of the one-pass algorithm with the birthday paradox is the increase of the noise. Let p be the probability that \(e_t=1\), and the correlation denoted by c is defined as \(c=1-2p\). If we use the XOR of parity-check equations to reduce the number of involved secret bits, the correlation of the modified equations drops to \(c^2\). The increase of the noise causes the increase of the data complexity.

Revisiting Fast Correlation Attack. In this paper, we revisit the fast correlation attack. We first review the structure of parity-check equations from a new point of view based on a finite field, and the new viewpoint brings a new property for the fast correlation attack. A multiplication between \(n \times n\) matrices and an n-bit fixed vector is generally used to construct parity-check equations. Our important observation is to show that this multiplication is “commutative” via the finite field, and it brings the new property for the fast correlation attack.

We first review the traditional wrong-key hypothesis, i.e., we observe correlation 0 when incorrect initial state is guessed. The new property implies that we need to reconsider the wrong-key hypothesis more carefully. Specifically, assuming that there are multiple high-biased linear masks, the traditional wrong-key hypothesis does not hold. We then show a modified wrong-key hypothesis.

The new property is directly useful to improve the efficiency of the fast correlation attack when there are multiple high-biased linear masks. In the previous fast correlation attack, the multiple approximations are only useful to reduce the data complexity but are not useful to reduce the time complexity [11]. We propose a new algorithm that reduces both time and data complexities. Our new algorithm is a kind of the one-pass algorithm, but the technique to avoid the exhaustive search of the initial state is completely different from previous ones. The multiple linear masks are directly exploited to avoid the exhaustive search.

Table 1. Summary of results, where the key-stream generator and initialization are denoted as ksg and init, respectively.

Applications. We apply our new algorithm to the Grain family, where there are three well-known stream ciphers: Grain-128a [20], Grain-128 [21], and Grain-v1 [22]. The Grain family is amongst the most attractive stream ciphers, and especially Grain-v1 is in the eSTREAM portfolio and Grain-128a is standardized by ISO/IEC [23]. Moreover the structure is recently used to design a lightweight hash function [24] and stream ciphers [25, 26].

Our new algorithm breaks each of full Grain-128a, Grain-128, and Grain-v1. Among them, this is the first cryptanalysis against full Grain-128aFootnote 1. Regarding full Grain-128, our algorithm is the first attack against the key-stream generator. Regarding full Grain-v1, our algorithm is more efficient than the previous attack [19], and it breaks Grain-v1 obviously faster than the brute-force attack.

To realize the fast correlation attack against all of the full Grain family, we introduce novel linear approximate representations. They well exploit their structure and reveal a new important vulnerability of the Grain family (Table 1).

Comparisons with Previous Attacks Against Grain Family. To understand this paper, it is not necessary to understand previous attacks, but we summarize previous attacks against the Grain family.

Before Grain-v1, there is an original Grain denoted by Grain-v0 [27], and it was broken by the fast correlation attack [11]. Grain-v1 is tweaked to remove the vulnerability of Grain-v0. Nevertheless, our new fast correlation attack can break full Grain-v1 thanks to the new property.

The near collision attack is the important previous attack against Grain-v1 [28], and very recently, an improvement called the fast near collision attack was proposed [19], where the authors claimed that the time complexity is \(2^{75.7}\). However, this estimation is controversial because the unit of the time complexity is “1 update function of reference code on software implementation,” and they estimated 1 update function to be \(2^{10.4}\) cycles. Therefore, the pure time complexity is rather \(2^{75.7+10.4}=2^{86.1}\) cycles, which is greater than \(2^{80}\). On the other hand, the time complexity of the fast correlation attack is \(2^{76.7}\), where the unit of the (dominant) time complexity is at most one multiplication with fixed values over the finite field. It is obviously faster than the brute-force attack, but it requires more data than the fast near collision attack.

Grain-128 is more aggressively designed than Grain-v1, where a quadratic function is adopted for the nonlinear feedback polynomial of the NFSR. Unfortunately, this low degree causes vulnerability against the dynamic cube attack [29]. While the initial work by Dinur and Shamir is a weak-key attack, it was then extended to the single-key attack [17] and recently improved [18]. The dynamic cube attack breaks the initialization, and the fast correlation attack breaks the key-stream generator. Note that different countermeasures are required for attacks against the key-stream generator and initialization. For example, we can avoid the dynamic cube attack by increasing the number of rounds in the initialization, but such countermeasure does not prevent the attack against the key-stream generator.

Grain-128a was designed to avoid the dynamic cube attack. The degree of the nonlinear feedback polynomial is higher than in Grain-128. No security flaws have been reported on full Grain-128a, but there are attacks against Grain-128a whose number of rounds in the initialization is reduced [30,31,32].

2 Preliminaries

2.1 LFSR-Based Stream Ciphers

The target of the fast correlation attack is LFSR-based stream ciphers, which are modeled as Fig. 1 simply. The LFSR generates an N-bit output sequence as \(\{s_0,s_1,\ldots ,s_{N-1}\}\), and the corresponding key stream \(\{z_0,z_1,\ldots ,z_{N-1}\}\) is computed as \(z_t = s_t \oplus e_t\), where \(e_t\) is a binary noise.

Let

$$\begin{aligned} f(x) = c_0 + c_1 x^1 + c_2 x^2 + \cdots + c_{n-1} x^{n-1} + x^{n} \end{aligned}$$

be the feedback polynomial of the LFSR and \(s^{(t)} = (s_t, s_{t+1}, \ldots , s_{t+n-1})\) be an n-bit internal state of the LFSR at time t. Then, the LFSR outputs \(s_t\), and the state is updated to \(s^{(t+1)}\) as

$$\begin{aligned} s^{(t+1)} = s^{(t)} \times F= s^{(t)} \times \begin{pmatrix} 0 &{} \cdots &{} 0 &{} 0 &{} c_0 \\ 1 &{} \cdots &{} 0 &{} 0 &{} c_{1} \\ \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} \cdots &{} 1 &{} 0 &{} c_{n-2} \\ 0 &{} \cdots &{} 0 &{} 1 &{} c_{n-1} \end{pmatrix}, \end{aligned}$$

where \(F\) is an \(n \times n\) binary matrix that represents the feedback polynomial f(x). In concrete LFSR-based stream ciphers, the binary noise \(e_t\) is nonlinearly generated from the internal state or another internal state.

2.2 Fast Correlation Attack

The fast correlation attack (FCA) exploits high correlation between the internal state of the LFSR and corresponding key stream [1, 2]. We first show the most simple model, where we assume that \(e_t\) itself is highly biased. Let p be the probability of \(e_t=1\), and the correlation c is defined as \(c = 1 - 2p\). We guess the initial internal state \(s^{(0)}\), calculate \(\{s_0,s_1,\ldots ,s_{N-1}\}\) from the guessed \(s^{(0)}\), and evaluate \(\sum _{t=0}^{N-1} (-1)^{s_t \oplus z_t}\), where the sum is computed over the set of integers. If the correct initial state is guessed, the sum is equal to \(\sum _{t=0}^{N-1} (-1)^{e_t}\) and follows a normal distribution \(\mathcal{N}(Nc, N)\)Footnote 2. On the other hand, we assume that the sum behaves at random when an incorrect initial state is guessed. Then, it follows a normal distribution \(\mathcal{N}(0, N)\). To distinguish the two distributions, we need to collect \(N \approx O(1/c^2)\) bits of the key stream.

The FCA can be regarded as a kind of a linear cryptanalysis [33]. The output \(s_t\) is linearly computed from \(s^{(0)}\) as \(s_t = \langle s^{(0)}, A_{t} \rangle \), where \(A_t\) is the 1st row vector in the transpose of \(F^t\) denoted by \({}^\mathrm {T}\!{F^t}\). In other words, \(A_{t}\) is used as linear masks, and the aim of attackers is to find \(s^{(0)}\) such that \(\sum _{t=0}^{N-1} (-1)^{\langle s^{(0)}, A_{t} \rangle }\) is far from N / 2.

Usually, the binary noise \(e_t\) is not highly biased in modern stream ciphers, but we may be able to observe high correlation by summing optimally chosen linear masks. In other words, we can execute the FCA if

$$\begin{aligned} e'_t = \bigoplus _{i \in \mathbb {T}_s}\langle s^{(t+i)}, \varGamma _i \rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i} \end{aligned}$$

is highly biased by optimally choosing \(\mathbb {T}_s\), \(\mathbb {T}_z\), and \(\varGamma _i\), where \(s^{(t+i)}\) and \(\varGamma _i\) are n-bit vectors. Recall \(s^{(t)} = s^{(0)} \times F^t\), and then, \(e_t'\) is rewritten as

$$\begin{aligned} e_t'&= \bigoplus _{i \in \mathbb {T}_s} \left\langle s^{(t+i)}, \varGamma _i \right\rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i} \\&= \bigoplus _{i \in \mathbb {T}_s} \left\langle s^{(0)} \times F^{t+i}, \varGamma _i \right\rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}\\&= \left\langle s^{(0)}, \left( \bigoplus _{i \in \mathbb {T}_s} (\varGamma _i \times {}^\mathrm {T}\!{F^{i}}) \right) \times {}^\mathrm {T}\!{F^t} \right\rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}. \end{aligned}$$

For simplicity, we introduce \(\varGamma \) denoted by \(\varGamma = \bigoplus _{i \in \mathbb {T}_s} (\varGamma _i \times {}^\mathrm {T}\!{F^{i}} )\). Then, we can introduce the following parity-check equations as

$$\begin{aligned} e'_t&= \left\langle s^{(0)}, \varGamma \times {}^\mathrm {T}\!{F^t} \right\rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}. \end{aligned}$$
(1)

We redefine p as the probability satisfying \(e_t'=1\) for all possible t, and the correlation c is also redefined from the corresponding p. Then, we can execute the FCA by using Eq. (1). Assuming that N parity-check equations are collected, we first guess \(s^{(0)}\) and evaluate \(\sum _{t=0}^{N-1} (-1)^{e'_t}\). While the sum follows a normal distribution \(\mathcal{N}(0, N)\) in the random case, it follows \(\mathcal{N}(Nc, N)\) if the correct \(s^{(0)}\) is guessed.

The most straightforward algorithm requires the time complexity of \(O(N2^n)\). Chose et al. showed that the guess and evaluation procedure can be regarded as a Walsh-Hadamard transform [14]. The fast Walsh-Hadamard transform (FWHT) can be successfully applied to accelerate the algorithm, and it reduces the time complexity to \(O(N+n2^n)\).

Definition 1

(Walsh-Hadamard Transform (WHT)). Given a function \(w: \{0,1\}^n \rightarrow \mathbb {Z}\), the WHT of w is defined as \(\hat{w}(s) = \sum _{x \in \{0,1\}^n} w(x) (-1)^{\langle s, x \rangle }\).

When we guess \(s \in \{0,1\}^n\), the empirical correlation \(\sum _{t=0}^{N-1} (-1)^{e'_t}\) is rewritten as

$$\begin{aligned} \sum _{t=0}^{N-1} (-1)^{e'_t}&= \sum _{t=0}^{N-1} (-1)^{\langle s, \varGamma \times {}^\mathrm {T}\!{F^t} \rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}} \\&= \sum _{x \in \{0,1\}^n} \left( \sum _{t \in \{0,1,\ldots ,N-1 | \varGamma \times {}^\mathrm {T}\!{F^t} = x\}} (-1)^{\langle s, x \rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}} \right) \\&= \sum _{x \in \{0,1\}^n} \left( \sum _{t \in \{0,1,\ldots ,N-1 | \varGamma \times {}^\mathrm {T}\!{F^t} = x\}} (-1)^{\bigoplus _{i \in \mathbb {T}_z} z_{t+i}} \right) (-1)^{\langle s, x \rangle }. \end{aligned}$$

Therefore, from the following public function w as

$$\begin{aligned} w(x):= \sum _{t \in \{0,1,\ldots ,N-1 | \varGamma \times {}^\mathrm {T}\!{F^t} = x\}} (-1)^{\bigoplus _{i \in \mathbb {T}_z} z_{t+i}}, \end{aligned}$$

we get \(\hat{w}\) by using the FWHT, where \(\hat{w}(s)\) is the empirical correlation when s is guessed.

3 Revisiting Fast Correlation Attack

We first review the structure of the parity-check equation by using a finite field and show that \(\varGamma \times {}^\mathrm {T}\!{F^t}\) is “commutative.” This new observation brings a new property for the FCA, and it is very important when there are multiple linear masks. As a result, we need to reconsider the wrong-key hypothesis carefully, i.e., there is a case that the most simple and commonly used hypothesis does not hold. Moreover, we propose a new algorithm that successfully exploits the new property to reduce the data and time complexities in the next section.

3.1 Reviewing Parity-Check Equations with Finite Field

We review \(\varGamma \times {}^\mathrm {T}\!{F^t}\) by using a finite field \(\mathrm {GF}(2^n)\), where the primitive polynomial is the feedback polynomial of the LFSR.

Recall the notation of \(A_{t} \in \{0,1\}^n\), which was defined as the 1st row vector in \({}^\mathrm {T}\!{F^t}\), and then, the ith row vector of \({}^\mathrm {T}\!{F^t}\) is represented as \(A_{t+i-1}\). Let \(\alpha \) be a element as \(f(\alpha )=0\) and it is a primitive element of \(\mathrm {GF}(2^n)\). We notice that \(\alpha ^t\) becomes natural conversion of \(A_{t} \in \{0,1\}^n\). We naturally convert \(\varGamma \in \{0,1\}^n\) to \(\gamma \in \mathrm {GF}(2^n)\). The important observation is that \(\varGamma \times {}^\mathrm {T}\!{F}\) also becomes natural conversion of \(\gamma \alpha \in \mathrm {GF}(2^n)\) because of

$$\begin{aligned} \varGamma \times {}^\mathrm {T}\!{F} = \varGamma \times \begin{pmatrix} 0 &{} 1 &{} \cdots &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} \cdots &{} 1 &{} 0 \\ 0 &{} 0 &{} \cdots &{} 0 &{} 1 \\ c_0 &{} c_1 &{} \cdots &{} c_{n-2} &{} c_{n-1} \end{pmatrix}. \end{aligned}$$

This trivially derives that \(\varGamma \times {}^\mathrm {T}\!{F^t}\) is also natural conversion of \(\gamma \alpha ^t \in \mathrm {GF}(2^n)\), and of course, the multiplication is commutative, i.e., \(\gamma \alpha ^t = \alpha ^t \gamma \). We finally consider a matrix multiplication corresponding to \(\alpha ^t \gamma \). Let \(M_\gamma \) be an \(n \times n\) binary matrix, where the ith row vector of \({}^\mathrm {T}\!{M_\gamma }\) is defined as the natural conversion of \(\gamma \alpha ^{i-1}\). Then, \(\alpha ^t \gamma \) is the natural conversion of \(A_{t} \times {}^\mathrm {T}\!{M_\gamma }\), and we acquire \(\varGamma \times {}^\mathrm {T}\!{F^t} = A_{t} \times {}^\mathrm {T}\!{M_\gamma }\). The following shows an example to understand this relationship.

Example 1

Let us consider a finite field \(\mathrm {GF}(2^8)=\mathrm {GF}(2)[x]/(x^8+x^4+x^3+x^2+1)\). When \(\varGamma =01011011\), the transpose matrix of the corresponding binary matrix \(M_\gamma \) is represented as

$$\begin{aligned} {}^\mathrm {T}\!{M_\gamma } = \begin{pmatrix} 0 &{} 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \end{pmatrix}, \end{aligned}$$

where the first row coincides with \(\varGamma \) and the second row is natural conversion of \(\gamma \alpha \). Then, \(\varGamma \times {}^\mathrm {T}\!{F^t} = A_{t} \times {}^\mathrm {T}\!{M_\gamma }\), and for example, when \(t=10\),

$$\begin{aligned}&\quad \varGamma \times {}^\mathrm {T}\!{F^{10}} = A_{10} \times {}^\mathrm {T}\!{M_\gamma },\\&\Leftrightarrow \begin{pmatrix} 0 &{} 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 1 &{} 1 \\ \end{pmatrix} \times \begin{pmatrix} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 0 \end{pmatrix}^{10} = \begin{pmatrix} 0&0&1&0&1&1&1&0 \end{pmatrix} \times \begin{pmatrix} 0 &{} 1 &{} 0 &{} 1 &{} 1 &{} 0 &{} 1 &{} 1 \\ 1 &{} 0 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 1 &{} 1 &{} 1 &{} 1 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 \\ 1 &{} 0 &{} 1 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \end{pmatrix}, \end{aligned}$$

and the result is 00010101.

We review Eq. (1) by using the “commutative” feature as

$$\begin{aligned} \left\langle s^{(0)}, \varGamma \times {}^\mathrm {T}\!{F^t} \right\rangle&= \left\langle s^{(0)}, A_{t} \times {}^\mathrm {T}\!{M_\gamma } \right\rangle = \left\langle s^{(0)} \times M_\gamma , A_{t} \right\rangle , \end{aligned}$$

and Eq. (1) is equivalently rewritten as

$$\begin{aligned} e'_t = \left\langle s^{(0)} \times M_\gamma , A_{t} \right\rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}. \end{aligned}$$

The equation above implies the following new property.

Property 1

We assume that we can observe high correlation when we guess \(s^{(0)}\) and parity-check equations are generated from \(\varGamma \times {}^\mathrm {T}\!{F^t}\). Then, we can observe exactly the same high correlation even if we guess \(s^{(0)} \times M_\gamma \) and parity-check equations are generated from \(A_{t}\) instead of \(\varGamma \times {}^\mathrm {T}\!{F^t}\).

Hereinafter, \(\gamma \in \mathrm {GF}(2^n)\) is not distinguished from \(\varGamma \in \{0,1\}^n\), and we use \(\gamma \) as a linear mask for simplicity.

3.2 New Wrong-Key Hypothesis

We review the traditional and commonly used wrong-key hypothesis, where we assume that the empirical correlation behaves as random when an incorrect initial state is guessed. However, Property 1 implies that we need to consider this hypothesis more carefully.

We assume that the use of a linear mask \(\varGamma \) leads to high correlation, and we simply call such linear masks highly biased linear masks. When we generate parity-check equations from \(\varGamma \times {}^\mathrm {T}\!{F{^t}}\), let us consider the case that we guess incorrect initial state \(s'^{(0)} = s^{(0)} \times M_{\gamma '}\). From Property 1

$$\begin{aligned} \left\langle s'^{(0)}, \varGamma \times {}^\mathrm {T}\!{F^t} \right\rangle = \left\langle s^{(0)} \times M_{\gamma '}, A_{t} \times {}^\mathrm {T}\!{M_\gamma } \right\rangle = \left\langle s^{(0)}, A_{t} \times {}^\mathrm {T}\!{M_{\gamma \gamma '}} \right\rangle \end{aligned}$$

In other words, it is equivalent to the case that \(\gamma \gamma '\) is used as a linear mask instead of \(\gamma \). If both \(\gamma \) and \(\gamma \gamma '\) are highly biased linear masks, we also observe high correlation when we guess \(s^{(0)} \times M_{\gamma '}\). Therefore, assuming that the target stream cipher has multiple linear masks with high correlation, the entire corresponding guessing brings high correlation.

We introduce a new wrong-key hypothesis based on Property 1. Assuming that there are m linear masks whose correlation is high and the others are correlation zero, we newly introduce the following wrong-key hypothesis.

Hypothesis 1

(New Wrong-Key Hypothesis). Assume that there are m highly biased linear masks as \(\gamma _1, \gamma _2, \ldots , \gamma _m\), and parity-check equations are generated from \(A_{t}\). Then, we observe high correlation when we guess \(s^{(0)} \times M_{\gamma _i}\) for any \(i \in \{1,2,\ldots ,m\}\). Otherwise, we assume that it behaves at random, i.e., the correlation becomes 0.

The new wrong-key hypothesis is a kind of extension from the traditional wrong-key hypothesis.

4 New Algorithm Exploiting New Property

Overview. We first show the overview before we detail our new attack algorithm. In this section, let n be the size of the LFSR in the target LFSR-based stream cipher, and we assume that there are \(m~({\ll }2^n)\) highly biased linear masks denoted by \(\gamma _1, \gamma _2, \ldots , \gamma _m\). The procedure consists of three parts: constructing parity-check equations, FWHT, and removing \(\gamma \).

  • We first construct parity-check equations. Parity-check equations of the traditional FCA are constructed from \(\varGamma \times {}^\mathrm {T}\!{F^t}\) and \(\bigoplus _{i \in \mathbb {T}_z} z_{t+i}\). In our new algorithm, we construct parity-check equations from \(A_{t}\) instead of \(\varGamma \times {}^\mathrm {T}\!{F^t}\).

  • We use the fast Walsh-Hadamard transform (FWHT) to get solutions with high correlation. In other words, we evaluate s such that \(\langle s, A_{t} \rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}\) is highly biased. As we explained in Sect. 3.1, we then observe high correlation when \(s = s^{(0)} \times M_{\gamma _i}\), and there are m solutions with high correlation. Unfortunately, even if FWHT is applied, we have to guess n bits and it requires \(n2^n\) time complexity. It is less efficient than the exhaustive search when the size of the LFSR is greater than or equal to the security level. To overcome this issue, we bypass some bits out of n bits by exploiting m linear masks. Specifically, we bypass \(\beta \) bits, i.e., we guess only \((n-\beta )\) bits and \(\beta \) bits are fixed to constant (e.g., 0). Even if \(\beta \) bits are bypassed, there are \(m2^{-\beta }\) solutions with high correlation in average. Therefore, \(m > 2^\beta \) is a necessary condition.

  • We pick solutions whose empirical correlation is greater than a threshold, where some of solutions are represented as \(s=s^{(0)} \times M_{\gamma _i}\). To remove \(M_{\gamma _i}\), we exhaustively guess the applied \(\gamma _i\) and recover \(s^{(0)}\). Assuming that \(N_p\) solutions are picked, the time complexity is \(N_p \times m\). If the expected number of occurrences that the correct \(s^{(0)}\) appears is significantly greater than that for incorrect ones, we can uniquely determine \(s^{(0)}\). We simulate them by using the Poisson distribution in detail.

4.1 Detailed Algorithm

Let n be the state size of the LFSR and \(\kappa \) be the security level. We assume that there are \(m_p\,(\ll 2^n)\) linear masks \(\gamma _{1}, \gamma _{2}, \ldots , \gamma _{m_p}\) with positive correlation that is greater than a given c. Moreover we assume that there are \(m_m\,(\ll 2^n)\) linear masks \(\rho _1, \rho _2, \ldots , \rho _{m_m}\) with negative correlation that is smaller than \(-c\). Note that c is close to 0, and \(m = m_p + m_m\).

Constructing Parity-Check Equations. We first construct parity-check equations from \(A_{t}\) and \(\bigoplus _{i \in \mathbb {T}_z} z_{t+i}\) for \(t=0,1,\ldots ,N-1\), and the time complexity is N. The empirical correlation follows \(\mathcal{N}(Nc,N)\) and \(\mathcal{N}(-Nc,N)\) when we guess one of \(s^{(0)} \times M_{\gamma _i}\) and \(s^{(0)} \times M_{\rho _i}\), respectively Footnote 3. Otherwise we assume that the empirical correlation follows \(\mathcal{N}(0,N)\).

FWHT with Bypassing Technique. We next pick \(s \in \{0,1\}^n\) such that \(|\frac{\sum _{t=0}^{N-1} (-1)^{e'_t}}{N}| \ge th\), where \(e'_t = \langle s, A_{t} \rangle \oplus \bigoplus _{i \in \mathbb {T}_z} z_{t+i}\) and \(th~(>0)\) is a threshold. Let \(\epsilon _1\) be the probability that values following \(\mathcal{N}(0,N)\) is greater than th, and let \(\epsilon _2\) be the probability that values following \(\mathcal{N}(Nc,N)\) is greater than th. Namely,

$$\begin{aligned} \epsilon _1 = \frac{1}{\sqrt{2 \pi N}} \int _{th}^{\infty } \exp \left( -\frac{x^2}{2 N} \right) dx, ~~ \epsilon _2 = \frac{1}{\sqrt{2 \pi N}} \int _{th}^{\infty } \exp \left( -\frac{(x-Nc)^2}{2 N} \right) dx. \end{aligned}$$

Note that the probability that values following \(\mathcal{N}(0,N)\) is smaller than \(-th\) is also \(\epsilon _1\) and the probability that values following \(\mathcal{N}(-Nc,N)\) is smaller than \(-th\) is also \(\epsilon _2\). Let \(\mathbb {S}_p\) and \(\mathbb {S}_m\) be the set of picked solutions with positive and negative correlation, respectively. The expected size of \(\mathbb {S}_p\) and \(\mathbb {S}_m\) is \((2^n \epsilon _1 + m_p \epsilon _2)\) and \((2^n \epsilon _1 + m_m \epsilon _2)\), respectively, when the whole of n-bit s is guessed.

Unfortunately, if we guess the whole of n-bit s, the time complexity of FWHT is \(n 2^n\) and it is less efficient than the exhaustive search when \(n \ge \kappa \). To reduce the time complexity, we assume multiple solutions. Instead of guessing the whole of s, we guess its partial \((n-\beta )\) bits, where bypassed \(\beta \) bits are fixed to constants, e.g., all 0. Then, the time complexity of the FWHT is reduced from \(n 2^n\) to \((n-\beta )2^{n-\beta }\). Even if \(\beta \) bits are bypassed, \(m_p 2^{-\beta } \epsilon _2\) (resp. \(m_m 2^{-\beta } \epsilon _2\)) solutions represented as \(s^{(0)} \times M_{\gamma _i}\) (resp. \(s^{(0)} \times M_{\rho _i}\)) remain. Moreover, the size of \(\mathbb {S}_p\) and \(\mathbb {S}_m\) also decreases to \((2^{n-\beta }\epsilon _1 + m_p2^{-\beta }\epsilon _2)\) and \((2^{n-\beta }\epsilon _1 + m_m2^{-\beta }\epsilon _2)\), respectively.

Removing \(\varvec{\gamma }\). For all \(s \in \mathbb {S}_p\) and all \(j \in \{1,2,\ldots ,m_p\}\), we compute \(s \times M_{\gamma _j}^{-1}\). It computes \(s^{(0)} \times M_{\gamma _i} \times M_{\gamma _j}^{-1}\) and becomes \(s^{(0)}\) when \(i=j\). Since there are \(m_p 2^{-\beta } \epsilon _2\) solutions represented as \(s^{(0)} \times M_{\gamma _i}\) in \(\mathbb {S}_p\), the correct \(s^{(0)}\) appears \(m_p 2^{-\beta } \epsilon _2\) times. On the other hand, every incorrect initial state appears about \(m_p(2^{n-\beta } \epsilon _1 + m_p 2^{-\beta } \epsilon _2)2^{-n}\) times when we assume uniformly random behavior. In total, every incorrect initial state appears about

$$\begin{aligned} \lambda _1&= m_p(2^{n-\beta } \epsilon _1 + m_p 2^{-\beta } \epsilon _2)2^{-n} + m_m(2^{n-\beta } \epsilon _1 + m_m 2^{-\beta } \epsilon _2)2^{-n} \\&= (m 2^{n-\beta } \epsilon _1 + (m_p^2+m_m^2) 2^{-\beta } \epsilon _2) 2^{-n} \end{aligned}$$

times when we assume uniformly random behavior. On the other hand, the correct \(s^{(0)}\) appears

$$\begin{aligned} \lambda _2 = (m_p + m_m) 2^{-\beta } \epsilon _2 = m 2^{-\beta } \epsilon _2 \end{aligned}$$

times.

The number of occurrences that every incorrect initial state appears follows the Poisson distribution with parameter \(\lambda _1\), and the number of occurrences that the correct \(s^{(0)}\) appears follows the Poisson distribution with parameter \(\lambda _2\). To recover the unique correct \(s^{(0)}\), we introduce a threshold \(th_p\) as

$$\begin{aligned} \sum _{k=th_p}^{\infty } \frac{\lambda _1^k e^{-\lambda _1}}{k!} < 2^{-n}. \end{aligned}$$

The probability that the number of occurrences that \(s^{(0)}\) appears is greater than \(th_p\) is estimated as \(\sum _{k=th_p}^{\infty } \frac{\lambda _2^k e^{-\lambda _2}}{k!}\). Therefore, if the probability is close to one, we can uniquely recover \(s^{(0)}\) with high probability.

4.2 Estimation of Time and Data Complexities

The procedure consists of three parts: constructing parity-check equations, FWHT, and removing \(\gamma \). The first step requires the time complexity N, where the unit of the time complexity is a multiplication by \(\alpha \) over \(\mathrm {GF}(2^n)\) and \(\bigoplus _{i \in \mathbb {T}_z} z_{t+i}\). The second step requires the time complexity \((n-\beta ) 2^{n-\beta }\), where the unit of the time complexity is an addition or subtractionFootnote 4. The final step requires the time complexity \((m 2^{n-\beta } \epsilon _1 + (m_p^2 + m_m^2) 2^{-\beta } \epsilon _2)\), where the unit of the time complexity is a multiplication by fixed values over \(\mathrm {GF}(2^n)\). These units of the time complexity are not equivalent, but at least, they are more efficient than the unit given by the initialization of stream ciphers. Therefore, for simplicity, we regard them as equivalent, and the total time complexity is estimated as

$$\begin{aligned} N + (n-\beta ) 2^{n-\beta } + m 2^{n-\beta } \epsilon _1 + (m_p^2 + m_m^2) 2^{-\beta } \epsilon _2. \end{aligned}$$

Proposition 1

Let n be the size of the LFSR in an LFSR-based stream cipher. We assume that there are m linear masks whose absolute value of correlation is greater than c. When the size of bypassed bits is \(\beta \), we can recover the initial state of the LFSR with time complexity \(3(n-\beta )2^{n-\beta }\) and the required number of parity-check equations is \(N=(n-\beta )2^{n-\beta }\), where the success probability is \(\sum _{k=th_p}^{\infty } \frac{\lambda _2^k e^{-\lambda _2}}{k!}\), where \(th_p\) is the minimum value satisfying

$$\begin{aligned} \sum _{k=th_p}^{\infty } \frac{N^k e^{-N}}{k!} < 2^{-n}, \end{aligned}$$

and

$$\begin{aligned} \lambda _2&= \frac{m 2^{-\beta }}{\sqrt{2 \pi N}} \int _{th}^{\infty } \exp \left( -\frac{(x-Nc)^2}{2 N} \right) dx, \\ th&= \sqrt{2N} \times \mathrm{erfc}^{-1} \left( \frac{2(n- \beta )}{m} \right) . \end{aligned}$$

Proof

The total time complexity is estimated as

$$\begin{aligned} N + (n-\beta ) 2^{n-\beta } + m 2^{n-\beta } \epsilon _1 + (m_p^2 + m_m^2) 2^{-\beta } \epsilon _2. \end{aligned}$$

In the useful attack parameter, since \( (m_p^2 + m_m^2) 2^{-\beta } \epsilon _2\) is significantly smaller than the others, we regard it as negligible. We consider the case that other three terms are balanced, i.e.,

$$\begin{aligned} N = (n-\beta )2^{n-\beta } = m2^{n-\beta } \epsilon _1, \end{aligned}$$

where \(\epsilon _1\) is estimated as

$$\begin{aligned} \epsilon _1 = \frac{1}{\sqrt{2 \pi N}} \int _{th}^{\infty } \exp \left( -\frac{x^2}{2 N} \right) dx = \frac{1}{2} \times \mathrm{erfc}\left( \frac{th}{\sqrt{2N}} \right) = \frac{n- \beta }{m}. \end{aligned}$$

Thus, when th is

$$\begin{aligned} th = \sqrt{2N} \times \mathrm{erfc}^{-1} \left( \frac{2(n- \beta )}{m} \right) , \end{aligned}$$

complexities of the three terms are balanced. We finally evaluate the probability that the initial state of the LFSR is uniquely recovered. The number of occurrences that each incorrect value appears follows the Poisson distribution with parameter \(\lambda _1 = N 2^{-n}\). To discard all \(2^n-1\) incorrect values, recall \(th_p\) satisfying \(\sum _{k=th_p}^{\infty } \frac{\lambda _1^k e^{-\lambda _1}}{k!} < 2^{-n}\). Then, the success probability is \(\sum _{k=th_p}^{\infty } \frac{\lambda _2^k e^{-\lambda _2}}{k!}\) where \(\lambda _2\) is

$$\begin{aligned} \lambda _2&= m 2^{-\beta } \epsilon _2 = \frac{m 2^{-\beta }}{\sqrt{2 \pi N}} \int _{th}^{\infty } \exp \left( -\frac{(x-Nc)^2}{2 N} \right) dx \end{aligned}$$

   \(\square \)

Fig. 2.
figure 2

Theoretical estimation for Example 2.

Example 2

Let us consider an attack against an LFSR-based stream cipher with 80-bit LFSR. We assume that there are \(2^{14}\) linear masks whose correlation is greater than \(2^{-36}\). For \(\beta =9\), we use \(N=(80-9) \times 2^{80-9} \approx 2^{77.1498}\) parity-check equations. The left figure of Fig. 2 shows two normal distributions: random and biased cases. If we use a following threshold

$$\begin{aligned} th=\sqrt{2N} \times \mathrm{erfc}^{-1}\left( \frac{2(n-\beta )}{m} \right) \approx 2^{39.9672}, \end{aligned}$$

\(\epsilon _1 = (n-\beta )/m \approx 2^{-7.8503}\) and \(\epsilon _2 = 0.99957\). The expected number of picked solutions is \(2^{80-9} \epsilon _1 + 2^{14-9} \epsilon _2 \approx 2^{63.1498} + 31.98627 \approx 2^{63.1498}\). We apply \(2^{14}\) inverse linear masks to the picked solutions and recover \(s^{(0)}\), and the time complexity is \(2^{63.1498 + 14} = 2^{77.1498}\).

The number of occurrences that each incorrect value appears follows the Poisson distribution with parameter \(\lambda _1 = 2^{77.1498-80} = 2^{-2.8502}\). On the other hand, the number of occurrences that \(s^{(0)}\) appears follows the Poisson distribution with parameter \(\lambda _2 = 2^{14-9} \times 0.99957 \approx 31.98627\). The right figure of Fig. 2 shows two Poisson distributions. For example, when \(th_p=15\) is used, the probability that an incorrect value appears at least 15 is smaller than \(2^{-80}\). However, the corresponding probability for \(s^{(0)}\) is 99.9%. As a result, the total time complexity is \(3 \times 2^{77.1498} \approx 2^{78.7348}\).

5 Application to Grain-128a

We apply the new algorithm to the stream cipher Grain-128a [20], which has two modes of operations: stream cipher mode and authenticated encryption mode. We assume that all output sequences of the pre-output function can be observed. Under the known-plaintext scenario, this assumption is naturally realized for the stream cipher mode because the output is directly used as a key stream. On the other hand, this assumption is very strong for the authenticated encryption mode because only even-clock output is used as the key stream. Therefore, we do not claim that the authenticated encryption mode can be broken.

Fig. 3.
figure 3

Specification of Grain-128a

5.1 Specification of Grain-128a

Let \(s^{(t)}\) and \(b^{(t)}\) be 128-bit internal states of the LFSR and NFSR at time t, respectively, and \(s^{(t)}\) and \(b^{(t)}\) are represented as \(s^{(t)} = (s_{t}, s_{t+1}, \ldots , s_{t+127})\) and \(b^{(t)} = (b_{t}, b_{t+1}, \ldots , b_{t+127})\). Let \(y_t\) be an output of the pre-output function at time t, and it is computed as

$$\begin{aligned} y_{t}&=h(s^{(t)}, b^{(t)}) \oplus s_{t+93} \oplus \bigoplus _{j \in \mathbb {A}} b_{t+j}, \end{aligned}$$
(2)

where \(\mathbb {A} = \{2,15,36,45,64,73,89\}\), and \(h(s^{(t)}, b^{(t)})\) is defined as

$$\begin{aligned} h(s^{(t)}, b^{(t)})&= h(b_{t+12}, s_{t+8}, s_{t+13}, s_{t+20}, b_{t+95}, s_{t+42}, s_{t+60}, s_{t+79}, s_{t+94})\\&= b_{t+12}s_{t+8} \oplus s_{t+13}s_{t+20} \oplus b_{t+95}s_{t+42} \oplus s_{t+60}s_{t+79} \oplus b_{t+12}b_{t+95}s_{t+94}. \end{aligned}$$

Moreover, \(s_{t+128}\) and \(b_{t+128}\) are computed by

$$\begin{aligned} s_{t+128}&=s_t \oplus s_{t+7} \oplus s_{t+38} \oplus s_{t+70} \oplus s_{t+81} \oplus s_{t+96},\\ b_{t+128}&=s_t \oplus b_t \oplus b_{t+26} \oplus b_{t+56} \oplus b_{t+91} \oplus b_{t+96} \oplus b_{t+3}b_{t+67} \oplus b_{t+11}b_{t+13} \\&\quad \oplus b_{t+17}b_{t+18} \oplus b_{t+27}b_{t+59} \oplus b_{t+40}b_{t+48} \oplus b_{t+61}b_{t+65} \oplus b_{t+68}b_{t+84} \\&\quad \oplus b_{t+88}b_{t+92}b_{t+93}b_{t+95} \oplus b_{t+22}b_{t+24}b_{t+25} \oplus b_{t+70}b_{t+78}b_{t+82}. \end{aligned}$$

Let \(z_t\) be the key stream at time t, and \(z_t=y_t\) in the stream cipher mode. On the other hand, in the authenticated encryption mode, \(z_t = y_{2w+2i}\), where w is the tag size. Figure 3 shows the specification of Grain-128a.

5.2 Linear Approximate Representation for Grain-128a

If there are multiple linear masks with high correlation, the new algorithm can be applied. In this section, we show that Grain-128a has many linear approximate representations, and they produce many linear masks.

Fig. 4.
figure 4

Linear Approximate Representation for Grain-128a

Figure 4 shows the high-level view of the linear approximate representation. It involves from tth to \((t+n+1)\)th rounds, where \(b^{(t)}\) and \(b^{(t+n+1)}\) must be linearly inactive to avoid involving the state of NFSR. Moreover, \(y_{t+i}\) is linearly active for \(i \in \mathbb {T}_z\), and the linear mask of the input of the \((t+i)\)-round h function denoted by \(\varLambda _{i}\) must be nonzero for \(i \in \mathbb {T}_z\). Otherwise, it must be zero.

We focus on the structure of the h function, where the input consists of 7 bits from the LFSR and 2 bits from the NFSR. Then, non-zero \(\varLambda _{i}\) can take several values, and specifically, \(\varLambda _{i}\) can take 64 possible values (see Table 2) under the condition that a linear mask for 2 bits from NFSR is fixed. Since the sum of \(y_{t+i}\) for \(i\in \mathbb {T}_z\) is used, it implies that there are \(64^{|\mathbb {T}_z|}\) linear approximate representations. These many possible representations are obtained by exploiting the structure of the h function, and this structure is common for all ciphers in the Grain family. In other words, this is a new potential vulnerability of the Grain family.

We first consider \(\mathbb {T}_z\) to construct the linear approximate representation, but it is difficult to find an optimal \(\mathbb {T}_z\). Our strategy is heuristic and does not guarantee the optimality, but the found \(\mathbb {T}_z\) is enough to break full Grain-128a. Once \(\mathbb {T}_z\) is determined, we first evaluate the correlation of a linear approximate representation on fixed \(\varLambda _{i}\) for \(i \in \{0,1,\ldots ,n\}\). The high-biased linear mask \(\gamma \) used in our new algorithm is constructed by \(\varLambda _{i}\), and the correlation of \(\gamma \) is estimated from the correlation of \(\varLambda _{i}\).

Table 2. Correlation of the h function. The horizontal axis shows \(\varLambda _{h,i}[1-3]\), the vertical axis shows \(\varLambda _{h,i}[5-8]\), and \(512 \times cor_{h,i}\) is shown in every cell.

Finding Linear Masks with High Correlation. We focus on the sum of key stream bits, i.e., \(\bigoplus _{i \in \mathbb {T}_z} y_{t+i}\). From Eq. (2), the sum is represented as

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} y_{t+i}&= \bigoplus _{i \in \mathbb {T}_z} \left( h(s^{(t+i)}, b^{(t+i)}) \oplus s_{t+i+93} \oplus \bigoplus _{j \in \mathbb {A}} b_{t+i+j} \right) \\&= \bigoplus _{i \in \mathbb {T}_z} \left( h(s^{(t+i)}, b^{(t+i)}) \oplus s_{t+i+93} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( \bigoplus _{i \in \mathbb {T}_z} b_{t+j+i} \right) . \end{aligned}$$

We first consider an appropriate set \(\mathbb {T}_z\). We focus on \(\bigoplus _{i \in \mathbb {T}_z} b_{t+j+i}\) and choose \(\mathbb {T}_z\) such that \(\bigoplus _{i \in \mathbb {T}_z} b_{t+j+i}\) is highly biased. Concretely, we tap 6 bits whose index corresponds to linearly tapped bits in the g function, i.e., \(\mathbb {T}_z= \{0,26,56,91,96,128\}\). Then, for any j,

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} b_{t+j+i}&= b_{t+j} \oplus b_{t+j+26} \oplus b_{t+j+56} \oplus b_{t+j+91} \oplus b_{t+j+96} \oplus b_{t+j+128} \\&= s_{t+j} \oplus g'(b^{(t+j)}), \end{aligned}$$

where

$$\begin{aligned} g'(b^{(t)})&= b_{t+3}b_{t+67} \oplus b_{t+11}b_{t+13} \oplus b_{t+17}b_{t+18} \oplus b_{t+27}b_{t+59} \oplus b_{t+40}b_{t+48} \\&\quad \oplus b_{t+61}b_{t+65} \oplus b_{t+68}b_{t+84} \oplus b_{t+88}b_{t+92}b_{t+93}b_{t+95} \\&\quad \oplus b_{t+22}b_{t+24}b_{t+25} \oplus b_{t+70}b_{t+78}b_{t+82}. \end{aligned}$$

Note that all bits in \(g'(b^{(t)})\) are nonlinearly involved, and the correlation may be high. Then

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} y_{t+i}&= \bigoplus _{i \in \mathbb {T}_z} \left( h(s^{(t+i)}, b^{(t+i)}) \oplus s_{t+i+93} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( s_{t+j} \oplus g'(b^{(t+j)}) \right) \\&=\bigoplus _{i \in \mathbb {T}_z} s_{t+i+93} \oplus \bigoplus _{j \in \mathbb {A}} s_{t+j} \oplus \bigoplus _{i \in \mathbb {T}_z} h(s^{(t+i)}, b^{(t+i)}) \oplus \bigoplus _{j \in \mathbb {A}} g'(b^{(t+j)}). \end{aligned}$$

We next consider a linear approximate representation of \(h(s^{(t+i)}, b^{(t+i)})\). Let \(\varLambda _{i} \in \{0,1\}^9\) be the input linear mask for the h function at time \(t+i\), and \(\varLambda _{i} = (\varLambda _{i}[0], \varLambda _{i}[1], \ldots , \varLambda _{i}[8])\). Then,

$$\begin{aligned}&h(s^{(t+i)}, b^{(t+i)})\\&\approx \varLambda _{i}[0] b_{t+i+12} \oplus \varLambda _{i}[4] b_{t+i+95} \oplus \langle \varLambda _{i}[1-3], (s_{t+i+8}, s_{t+i+13}, s_{t+i+20}) \rangle \\&\qquad \oplus \langle \varLambda _{i}[5-8], (s_{t+i+42}, s_{t+i+60}, s_{t+i+79}, s_{t+i+94}) \rangle , \end{aligned}$$

where \(\varLambda _{i}[x-y]\) denotes a sub vector indexed from xth bit to yth bit. Let \(cor_{h,i}(\varLambda _{i})\) be the correlation of the h function at time \(t+i\), and Table 2 summarizes them. From Table 2, \(cor_{h,i}(\varLambda _{i})\) is 0 or \(\pm 2^{-4}\). We have 6 active h functions because \(|\mathbb {T}_z|=6\), and let \(\varLambda _{\mathbb {T}_z} \in \{0,1\}^{9 \times |\mathbb {T}_z|}\) be the concatenated linear mask, i.e., \(\varLambda _{\mathbb {T}_z} = (\varLambda _{0}, \varLambda _{26}, \varLambda _{56}, \varLambda _{91}, \varLambda _{96}, \varLambda _{128})\). The total correlation from all active h functions depends on \(\varLambda _{\mathbb {T}_z}\), and it is computed as \(cor_h(\varLambda _{\mathbb {T}_z}) = (-1)^{|\mathbb {T}_z|+1} \prod _{i \in \mathbb {T}_z} cor_{h,i}(\varLambda _{i})\) because of the piling-up lemma. Therefore, if \(\varLambda _{i}\) with correlation 0 is used for any \(i \in \mathbb {T}_z\), \(cor_h(\varLambda _{\mathbb {T}_z})=0\). Otherwise, \(cor_h(\varLambda _{\mathbb {T}_z})=\pm 2^{-24}\).

We guess all terms involved in the internal state of the LFSR in the FCA. Under the correlation \(\pm 2^{-24}\), we get

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} y_{t+i}&\approx (\text{ term } \text{ by } \text{ guessing } s^{(t)})\\&\quad \oplus \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[0] b_{t+i+12} \oplus \varLambda _{i}[4] b_{t+i+95} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) . \end{aligned}$$

Therefore, if

$$\begin{aligned} cor_g(\varLambda _{\mathbb {T}_z})&= \Pr \left[ \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[0] b_{t+i+12} \oplus \varLambda _{i}[4] b_{t+i+95} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) =0\right] \\&\quad - \Pr \left[ \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[0] b_{t+i+12} \oplus \varLambda _{i}[4] b_{t+i+95} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) =1\right] \end{aligned}$$

is high, the FCA can be successfully applied. Note that \(cor_g(\varLambda _{\mathbb {T}_z})\) is independent of \(\varLambda _{i}[1-3,5-8]\) for any \(i \in \mathbb {T}_z\). To evaluate its correlation, we divide \(\bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) \) into 20 terms such that \(b_{t+67}\) and \(b_{t+137}\) are involved by multiple terms. Then, we try out 4 possible values of \((b_{t+67}, b_{t+137})\) and evaluate correlation independently. As a result, when \((b_{t+67}, b_{t+137})=(0,0)\) and \((b_{t+67}, b_{t+137})=(0,1)\), the correlation is \(-2^{-33.1875}\) and \(-2^{-33.4505}\), respectively. On the other hand, the correlation is 0 when \(b_{t+67}=1\). Therefore

$$\begin{aligned} cor_g(\varLambda _{\mathbb {T}_z}) = \frac{-2^{-33.1875}-2^{-33.4505}}{4} =-2^{-34.313} \end{aligned}$$

when \(\varLambda _{i}[0,4]=0\) for all \(i \in \mathbb {T}_z\).

Table 3. Summary of correlations when \(\varLambda _{i}[0,4]\) is fixed. Let \(*\) be arbitrary bit.

We similarly evaluate \(cor_g(\varLambda _{\mathbb {T}_z})\) when \(\varLambda _{i}[0,4]\ne 0\) for any \(i \in \mathbb {T}_z\). If one of \(\varLambda _{0}[0]\), \(\varLambda _{26}[0]\), \(\varLambda _{56}[0]\), \(\varLambda _{91}[4]\), \(\varLambda _{96}[4]\), and \(\varLambda _{128}[4]\) is 1, the correlation is always 0 because \(b_{t+12}\), \(b_{t+38}\), \(b_{t+68}\), \(b_{t+186}\), \(b_{t+191}\), and \(b_{t+223}\) are not involved to \(\bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) \). Table 3 summarizes \(cor_g(\varLambda _{\mathbb {T}_z})\) when \(\varLambda _{0}[0]\), \(\varLambda _{26}[0]\), \(\varLambda _{56}[0]\), \(\varLambda _{91}[4]\), \(\varLambda _{96}[4]\), and \(\varLambda _{128}[4]\) are 0.

For any fixed \(\varLambda _{i}\), we can get the following linear approximate representation

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} y_{t+i}&\approx \bigoplus _{i \in \mathbb {T}_z} s_{t+i+93} \oplus \bigoplus _{j \in \mathbb {A}} s_{t+j} \oplus \bigoplus _{i \in \mathbb {T}_z} \langle \varLambda _{i}[1-3], (s_{t+i+8}, s_{t+i+13}, s_{t+i+20}) \rangle \nonumber \\&\quad \oplus \bigoplus _{i \in \mathbb {T}_z} \langle \varLambda _{i}[5-8], (s_{t+i+42}, s_{t+i+60}, s_{t+i+79}, s_{t+i+94}) \rangle . \end{aligned}$$
(3)

From the piling-up lemma, the correlation is computed as

$$\begin{aligned} -cor_g(\varLambda _{\mathbb {T}_z}) \times cor_h(\varLambda _{\mathbb {T}_z}), \end{aligned}$$

where \(cor_g(\mathbb {T}_z)\) is summarized in Table 3 and \(cor_h(\varLambda _{\mathbb {T}_z})=(-1)^{|\mathbb {T}_z|+1} \prod _{i \in \mathbb {T}_z}\) \(cor_{h,i}(\varLambda _{i})\).

How to Find Multiple \(\varvec{\gamma }\). The correlation of the linear approximate representation on fixed \(\varLambda _{i}\) was estimated in the paragraph above. The linear mask \(\gamma \) used in the FCA directly is represented as

$$\begin{aligned} \gamma =&\sum _{i \in \mathbb {T}_z} \big (\varLambda _{i}[1]\alpha ^{i+8} +\varLambda _{i}[2]\alpha ^{i+13} + \varLambda _{i}[3]\alpha ^{i+20} + \varLambda _{i}[5]\alpha ^{i+42} \\&\quad + \varLambda _{i}[6]\alpha ^{i+60} + \varLambda _{i}[7]\alpha ^{i+79} + \varLambda _{i}[8]\alpha ^{i+94} + \alpha ^{i+93} \big ) + \sum _{j \in \mathbb {A}} \alpha ^{j}. \end{aligned}$$

If different \(\varLambda _{\mathbb {T}_z}\)s derive the same \(\gamma \), we need to sum up corresponding correlations.

Clearly, since this linear approximate representation does not involve \(\varLambda _i[0,4]\) for \(i \in \mathbb {T}_z\), we need to sum up \(2^{2 \times |\mathbb {T}_z|}=2^{12}\) correlations, where \(\varLambda _i[1-3,5-8]\) is identical and only \(\varLambda _i[0,4]\) varies for \(i \in \mathbb {T}_z\). Let V be a linear span whose basis is 12 corresponding unit vectors.

Moreover, there are special relationships. When we focus on \(\varLambda _{56}[6]\) and \(\varLambda _{96}[3]\), corresponding elements over \(\mathrm {GF}(2^{128})\) are identical because \(\alpha ^{56+60}=\alpha ^{96+20}=\alpha ^{116}\). In other words, \((\varLambda _{56}[6], \varLambda _{96}[3]) = (0,0)\) and \((\varLambda _{56}[6], \varLambda _{96}[3]) = (1,1)\) derive the same \(\gamma \), and \((\varLambda _{56}[6], \varLambda _{96}[3]) = (1,0)\) and \((\varLambda _{56}[6], \varLambda _{96}[3]) = (0,1)\) also derive the same \(\gamma \). We have 3 such relationships as follows.

  • \(\varLambda _{56}[6]\) and \(\varLambda _{96}[3]\). Then, \(\alpha ^{56+60}=\alpha ^{96+20}=\alpha ^{116}\).

  • \(\varLambda _{91}[2]\) and \(\varLambda _{96}[1]\). Then, \(\alpha ^{91+13}=\alpha ^{96+8}=\alpha ^{104}\).

  • \(\varLambda _{91}[7]\) and \(\varLambda _{128}[5]\). Then, \(\alpha ^{91+79}=\alpha ^{128+42}=\alpha ^{170}\).

Therefore, from following three vectors

$$\begin{aligned} w1(\delta [0])&= (0^9 , 0^9 , 000000100 , 000000000 , 000\overline{\delta [0]}00000 , 000000000),\\ w2(\delta [1])&= (0^9 , 0^9 , 000000000 , 001000000 , 0\overline{\delta [1]}0000000 , 000000000 ),\\ w3(\delta [2])&= (0^9 , 0^9 , 000000000 , 000000010 , ~~~000000000 , 00000\overline{\delta [2]}000), \end{aligned}$$

a linear span \(W(\delta ) = \mathrm{span}(w1(\delta [0]),w2(\delta [1]),w3(\delta [2]))\) is defined, where \(\overline{\delta [i]}=\delta [i] \oplus 1\). As a result, the correlation for \(\gamma \) denoted by \(cor_\gamma \) is estimated as

$$\begin{aligned} cor_\gamma = \sum _{w \in W(\delta )} \sum _{v \in V} -cor_g(\varLambda _{\mathbb {T}_z} \oplus v) \times cor_h(\varLambda _{\mathbb {T}_z} \oplus v \oplus w). \end{aligned}$$

Note that \(cor_g\) is independent of \(w \in W(\delta )\).

We heuristically evaluated \(\gamma \) with high correlation. As shown in Table 2, the number of possible \(\varLambda _{i}\) is at most 64. Otherwise, \(cor_h\) is always 0. Therefore, the search space is reduced from \(2^{54}\) to \(2^{36}\). Moreover, \(\varLambda _{0}\) is not involved in \(W(\delta )\), and the absolute value of \(cor_\gamma \) is invariable as far as we use \(\varLambda _{0}\) satisfying \(cor_{h,0}=\pm 2^{-4}\). Therefore, we do not need to evaluate \(\varLambda _{0}\) anymore, and the search space is further reduced from \(2^{36}\) to \(2^{30}\). While \(\varLambda _{26}\) is also not involved to \(W(\delta )\), we have non-zero correlation for both cases as \(\varLambda _{26}[4]=0\) and 1 (see Table 3). If the sign of \(cor_{h,26}\) for \(\varLambda _{26}[4]=0\) is different from that for \(\varLambda _{26}[4]=1\), they cancel each other out. Therefore, we should use \(\varLambda _{26}\) such that the sign of correlation of \(\varLambda _{26}\) is equal to that of \(\varLambda _{26} \oplus (000010000)\), and the number of such candidates is 32. Then, we do not need to evaluate \(\varLambda _{26}\) anymore, and the search space is further reduced from \(2^{30}\) to \(2^{24}\). We finally evaluated \(2^{24}\) \(\varLambda _{\mathbb {T}_z}\) exhaustively. As a result, we found \(49152 \times 64 \times 32 \approx 2^{26.58}\) \(\gamma \) whose absolute value of correlation is greater than \(2^{-54.2381}\).

Fig. 5.
figure 5

Time complexity and success probability. FCA against Grain-128a.

Estimation of Attack Complexity and Success Probability We apply the attack algorithm described in Sect. 3, and Proposition 1 is used to estimate the attack complexity and success probability. Figure 5 shows the relationship between the time complexity, success probability, and the size of bypassed bits, where \((n,m,c)=(128,49152 \times 64 \times 32,\pm 2^{-54.2381})\) is used. From Fig. 5, \(\beta = 21\) is preferable. The time complexity is \(3 \times (128-21) \times 2^{128-21} \approx 2^{115.3264}\) and the corresponding success probability is almost 100%. Moreover when \(\beta = 22\), the time complexity is \(2^{114.3129}\) and the success probability is 60.95%.

The estimation above only evaluates the time complexity to recover the initial state of the LFSR. To recover the secret key, we need to recover the whole of the initial state. Our next goal is to recover the initial state of the NFSR under the condition that the initial state of the LFSR is uniquely determined, but it is not difficult. We have several methods to recover the initial state and explain the most simple method.

The key stream is generated as Eq. (2). We focus on \((y_0, \ldots , y_{34})\), which involves 128 bits as \((b_{2}, \ldots , b_{129})\). We first guess 93 bits, and the remaining 35 bits are recovered by using corresponding Eq. (2). Specifically, we first guess \((b_{33}, \ldots , b_{75}, b_{80}, \ldots , b_{129})\). Then, \((b_{76}, \ldots , b_{79})\) are uniquely determined by using \((y_{31}, \ldots , y_{34})\). Similarly, we can uniquely determine the remaining 31 bits step by step. While we need to guess 93 bits, the time complexity is negligible compared with that for the FCA.

5.3 Application to Grain-128

We also applied our technique to Grain-128, but we briefly show the result due to the page limitation. Since Grain-128 is very similar to Grain-128a, we can use the same \(\mathbb {T}_z\). Then, \(cor_g = -2^{-32}\), where \(\varLambda _{26}[4]\) and \(\varLambda _{91}[0]\) can be chosen arbitrary but the others must be 0.

We heuristically evaluated \(\gamma \) with high correlation, and we used the same strategy as the case of Grain-128a. As a result, we found \(2^{15} \times 64 \times 32 = 2^{26}\) \(\gamma \) with correlation \(\pm 2^{-51}\). We apply the attack algorithm described in Sect. 3, and Proposition 1 is used to estimate the attack complexity and success probability. As a result, \(\beta = 22\) is preferable, and the time complexity is \(3 \times (128-22) \times 2^{128-22} \approx 2^{114.3129}\) and the corresponding success probability is 99.0%.

6 Application to Grain-v1

6.1 Specification of Grain-v1

Let \(s^{(t)}\) and \(b^{(t)}\) be 80-bit internal states of the LFSR and NFSR at time t, respectively, and \(s^{(t)}\) and \(b^{(t)}\) are represented as \(s^{(t)} = (s_{t}, s_{t+1}, \ldots , s_{t+79})\) and \(b^{(t)} = (b_{t}, b_{t+1}, \ldots , b_{t+79})\), respectively. Then, let \(z_t\) be a key stream at time t, and it is computed as

$$\begin{aligned} z_{t}&=h(s^{(t)}, b^{(t)}) \oplus \bigoplus _{j \in \mathbb {A}} b_{t+j}, \end{aligned}$$
(4)

where \(\mathbb {A} = \{1,2,4,10,31,43,56\}\) and \(h(s^{(t)}, b^{(t)})\) is defined as

$$\begin{aligned} h(s^{(t)}, b^{(t)})&= h(s_{t+3}, s_{t+25}, s_{t+46}, s_{t+64}, b_{t+63})\\&= s_{t+25} \oplus b_{t+63} \oplus s_{t+3}s_{t+64} \oplus s_{t+46}s_{t+64} \oplus s_{t+64}b_{t+63} \\&\quad \oplus s_{t+3}s_{t+25}s_{t+46} \oplus s_{t+3}s_{t+46}s_{t+64}\oplus s_{t+3}s_{t+46}b_{t+63} \\&\quad \oplus s_{t+25}s_{t+46}b_{t+63} \oplus s_{t+46}s_{t+64}b_{t+63}. \end{aligned}$$

Moreover, \(s_{t+80}\) and \(b_{t+80}\) are computed by

$$\begin{aligned} s_{t+80}&=s_t \oplus s_{t+13} \oplus s_{t+23} \oplus s_{t+38} \oplus s_{t+51} \oplus s_{t+62},\\ b_{t+80}&= s_{t} \oplus b_{t+62} \oplus b_{t+60} \oplus b_{t+52} \oplus b_{t+45} \oplus b_{t+37} \oplus b_{t+33} \oplus b_{t+28} \oplus b_{t+21} \\&\quad \oplus b_{t+14} \oplus b_{t+9} \oplus b_{t} \oplus b_{t+63}b_{t+60} \oplus b_{t+37}b_{t+33} \oplus b_{t+15}b_{t+9}\\&\quad \oplus b_{t+60}b_{t+52}b_{t+45} \oplus b_{t+33}b_{t+28}b_{t+21} \oplus b_{t+63}b_{t+45}b_{t+28}b_{t+9}\\&\quad \oplus b_{t+60}b_{t+52}b_{t+37}b_{t+33} \oplus b_{t+63}b_{t+60}b_{t+21}b_{t+15}\\&\quad \oplus b_{t+63}b_{t+60}b_{t+52}b_{t+45}b_{t+37} \oplus b_{t+33}b_{t+28}b_{t+21}b_{t+15}b_{t+9}\\&\quad \oplus b_{t+52}b_{t+45}b_{t+37}b_{t+33}b_{t+28}b_{t+21}. \end{aligned}$$

6.2 Fast Correlation Attack Against Grain-v1

When we use \(\mathbb {T}_z=\{0,14,21,28,37,45,52,60,62,80\}\), we focus on the sum of the key stream bits, i.e., \(z_{t+0} \oplus z_{t+14} \oplus z_{t+21} \oplus z_{t+28} \oplus z_{t+37} \oplus z_{t+45} \oplus z_{t+52} \oplus z_{t+60} \oplus z_{t+62} \oplus z_{t+80}\).

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} z_{t+i}&= \bigoplus _{i \in \mathbb {T}_z} h(s^{(t+i)}, b^{(t+i)}) \oplus \bigoplus _{j \in \mathbb {A}} \left( \bigoplus _{i \in \mathbb {T}_z} b_{t+j+i} \right) . \end{aligned}$$

For any j,

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} b_{t+j+i} = s_{t+j} \oplus g'(b^{(t+j)}), \end{aligned}$$

where \(g'(b^{(t)})\) is defined as

$$\begin{aligned} g'(b^{(t)})&= b_{t+33} \oplus b_{t+9} \oplus b_{t+63}b_{t+60} \oplus b_{t+37}b_{t+33} \oplus b_{t+15}b_{t+9} \oplus b_{t+60}b_{t+52}b_{t+45} \\&\quad \oplus b_{t+33}b_{t+28}b_{t+21} \oplus b_{t+63}b_{t+45}b_{t+28}b_{t+9} \oplus b_{t+60}b_{t+52}b_{t+37}b_{t+33}\\&\quad \oplus b_{t+63}b_{t+60}b_{t+21}b_{t+15} \oplus b_{t+63}b_{t+60}b_{t+52}b_{t+45}b_{t+37} \\&\quad \oplus b_{t+33}b_{t+28}b_{t+21}b_{t+15}b_{t+9} \oplus b_{t+52}b_{t+45}b_{t+37}b_{t+33}b_{t+28}b_{t+21}. \end{aligned}$$

Then

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} z_{t+i}&= \bigoplus _{i \in \mathbb {T}_z} h(s^{(t+i)}, b^{(t+i)}) \oplus \bigoplus _{j \in \mathbb {A}} \left( s_{t+j} \oplus g'(b^{(t+j)}) \right) \\&=\bigoplus _{j \in \mathbb {A}} s_{t+j} \oplus \bigoplus _{i \in \mathbb {T}_z} h(s^{(t+i)}, b^{(t+i)}) \oplus \bigoplus _{j \in \mathbb {A}} g'(b^{(t+j)}). \end{aligned}$$
Table 4. Correlation of the h function, where \(32 \times cor_{h,i}\) is shown in every cell.

We next consider a linear approximate representation of \(h(s^{(t+i)}, b^{(t+i)})\). Let \(\varLambda _{i}\) be the input linear mask for the h function at time \(t+i\). Then

$$\begin{aligned}&h(s^{(t+i)}, b^{(t+i)}) \\&\approx \varLambda _{i}[4] b_{t+i+63} \oplus \langle \varLambda _{i}[0-3], (s_{t+i+3}, s_{t+i+25}, s_{t+i+46}, s_{t+i+64}) \rangle . \end{aligned}$$

Let \(cor_{h,i}(\varLambda _{i})\) be the correlation of the h function at time \(t+i\), and Table 4 summarizes them. From Table 4, \(cor_{h,i}(\varLambda _{i})\) is 0 or \(\pm 2^{-2}\). Since we have \(|\mathbb {T}_z|=10\) active h functions, the total correlation from all active h functions is computed as \((-1)^{|\mathbb {T}_z|+1} \prod _{i \in \mathbb {T}_z} cor_{h,i}(\varLambda _{i}) = \pm 2^{-20}\) because of the piling-up lemma. Note that \(\varLambda _{i}[0-3]\) is independent from the state of the NFSR.

All terms involved in the internal state of the LFSR can be guessed in the FCA. Therefore, under the correlation \(\pm 2^{-20}\), we get

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} z_{t+i}&= (\text{ term } \text{ by } \text{ guessing }) \oplus \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[4] b_{t+i+63} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) . \end{aligned}$$

Therefore, if

$$\begin{aligned} cor_g(\varLambda _{\mathbb {T}_z})&= \Pr \left[ \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[4] b_{t+i+63} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) =0\right] \\&\quad - \Pr \left[ \bigoplus _{i \in \mathbb {T}_z} \left( \varLambda _{i}[4] b_{t+i+63} \right) \oplus \bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) =1\right] \end{aligned}$$

is high, the FCA can be successfully applied.

Table 5. Summary of correlations when \(\varLambda _{i}[4]\) is fixed.

Similarly to the case of Grain-128a, we evaluate \(cor_g(\varLambda _{\mathbb {T}_z})\). If one of \(\varLambda _{0}[4]\), \(\varLambda _{37}[4]\), \(\varLambda _{52}[4]\), \(\varLambda _{60}[4]\), \(\varLambda _{62}[4]\), and \(\varLambda _{80}[4]\) is 1, the correlation is always 0 because \(b_{t+63}\), \(b_{t+100}\), \(b_{t+115}\), \(b_{t+123}\), \(b_{t+125}\), and \(b_{t+143}\) are not involved in \(\bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) \). Table 5 summarizes \(cor_g(\varLambda _{\mathbb {T}_z})\) when \(\varLambda _{i}[4]=0\) for \(i \in \{0,37,52,60,62,80\}\).

For any fixed \(\varLambda _{i}\), we can get the following linear approximate representation

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} z_{t+i}&\approx \bigoplus _{j \in \mathbb {A}} s_{t+j} \oplus \bigoplus _{i \in \mathbb {T}_z} \langle \varLambda _{i}[0-3], (s_{t+i+3}, s_{t+i+25}, s_{t+i+46}, s_{t+i+64}) \rangle . \end{aligned}$$
(5)

From the piling-up lemma, the correlation is computed as \(-cor_g(\varLambda _{\mathbb {T}_z}) \times cor_h(\varLambda _{\mathbb {T}_z})\).

How to Find Multiple \(\varvec{\gamma }\). The correlation of the linear approximate representation on fixed \(\varLambda _{i}\) was estimated in the paragraph above. The linear mask \(\gamma \) used in the FCA directly is represented as

$$\begin{aligned} \gamma =&\sum _{i \in \mathbb {T}_z} \big (\varLambda _{i}[0]\alpha ^{i+3} +\varLambda _{i}[1]\alpha ^{i+25} +\varLambda _{i}[2]\alpha ^{i+46} + \varLambda _{i}[3]\alpha ^{i+64} \big ) + \sum _{j \in \mathbb {A}} \alpha ^{j}. \end{aligned}$$

If different \(\varLambda _h\) have the same \(\gamma \), we need to sum up corresponding correlations.

This linear approximate representation does not use \(\varLambda _i[4]\) for \(i \in \mathbb {T}_z\). Therefore, we need to sum up \(2^{|\mathbb {T}_z|}=2^{10}\) correlations, where \(\varLambda _i[0-3]\) is identical and only \(\varLambda _i[5]\) varies for \(i \in \mathbb {T}_z\). Let V be a linear span whose basis is 12 corresponding unit vectors.

Moreover, there are special relationships similar to the case of Grain-128a, and we have four such relationships as

  • \(\varLambda _{37}[2]\) and \(\varLambda _{80}[0]\). Then, \(\alpha ^{37+46}=\alpha ^{80+3}=\alpha ^{83}\).

  • \(\varLambda _{62}[3]\) and \(\varLambda _{80}[2]\). Then, \(\alpha ^{62+64}=\alpha ^{80+46}=\alpha ^{126}\).

  • \(\varLambda _{0}[2]\) and \(\varLambda _{21}[1]\). Then, \(\alpha ^{0+46}=\alpha ^{21+25}=\alpha ^{46}\).

  • \(\varLambda _{21}[3]\) and \(\varLambda _{60}[1]\). Then, \(\alpha ^{21+64}=\alpha ^{60+25}=\alpha ^{85}\).

Therefore, from following four vectors

$$\begin{aligned} w1(\delta [0])&= (00000,0^5,~~~00000,0^5,00100,0^5,0^5, 00000,~~~00000,\overline{\delta [0]}0000), \\ w2(\delta [1])&= (00000,0^5,~~~00000,0^5,00000,0^5,0^5, 00000,~~~00010,00\overline{\delta [1]}00), \\ w3(\delta [2])&= (00100,0^5,0\overline{\delta [2]}000,0^5, 00000,0^5, 0^5, 00000,~~~00000, 00000), \\ w4(\delta [3])&= (00000,0^5,~~~00010, 0^5, 00000, 0^5, 0^5, 0\overline{\delta [3]}000, 00000, 00000), \end{aligned}$$

a linear span \(W(\delta ) = \mathrm{span}(w1(\delta [0]),w2(\delta [1]),w3(\delta [2]),w4(\delta [3]))\) is defined, where \(\overline{\delta [i]}=\delta [i] \oplus 1\). Then, let \(cor_\gamma \) be the correlation of \(\gamma \), and

$$\begin{aligned} cor_\gamma = \sum _{w \in W(\delta )} \sum _{v \in V} -cor_g(\varLambda _{\mathbb {T}_z} \oplus v) \times cor_h(\varLambda _{\mathbb {T}_z} \oplus v \oplus w). \end{aligned}$$

We heuristically evaluated \(\gamma \) with high correlation. For every element in \(\mathbb {T}_z\), since the subset \(\{14,28,45,52\}\) is independent of the special relationship, we first focus on the subset. Since \(b_{t+63+52}\) is not involved in \(\bigoplus _{j \in \mathbb {A}} \left( g'(b^{(t+j)}) \right) \), \(\varLambda _{52}[4]\) must be 0. Therefore, \(\varLambda _{52}[0-3]\) should be chosen as

$$\begin{aligned} \varLambda _{52}[0-3] \in \{0101,0111,1001,1011,1100,1101,1110,1111\}, \end{aligned}$$

and \(cor_\gamma \) is invariable as far as we use \(\varLambda _{52}\) satisfying \(cor_{h,52}=\pm 2^{-2}\). We do not need to evaluate \(\varLambda _{52}\) anymore, and the search space is reduced from \(2^{40}\) to \(2^{36}\). For \(i \in \{14,28,45\}\), corresponding masks should be chosen as

$$\begin{aligned} \varLambda _{i}[0-3] \in \{0101,0111,1001,1011,1100,1101,1110,1111\} \end{aligned}$$

because \(cor_g(\varLambda _{\mathbb {T}_z})\) is high when \((\varLambda _{14}[4], \varLambda _{21}[4], \varLambda _{28}[4], \varLambda _{45}[4])\) is 0010 or 0000. Let us focus on Table 5. We have three-type linear masks as

  • \(\varLambda _{i}[0-3] \in \{1001,1011,1100,1110\}\), where \(cor_{h,i} = \pm 2^{-2}\) for \(\varLambda _{i}[4]=0\) but \(cor_{h,i}=0\) for \(\varLambda _{i}[4]=1\).

  • \(\varLambda _{i}[0-3] \in \{0111,1101\}\), where the sign of \(cor_{h,i}\) is different in each case of \(\varLambda _{i}[4]=0\) or 1.

  • \(\varLambda _{i}[0-3] \in \{0101,1111\}\), where the sign of \(cor_{h,i}\) is the same in both cases of \(\varLambda _{i}[4]=0\) and 1.

Since \(cor_\gamma \) is invariable in each case, it is enough to evaluate one from each case. Therefore, the search space is reduced from \(2^{36}\) to \(3^3 \times 2^{24}\). We finally evaluated \(9 \times 2^{24}\) \(\varLambda _{\mathbb {T}_z}\) exhaustively. As a result, we found about 442368 \(\gamma \) whose absolute value of correlation is greater than \(2^{-36}\).

Estimating Attack Complexity and Success Probability. We apply the attack algorithm described in Sect. 3, and Proposition 1 is used to estimate the attack complexity and success probability. Figure 6 shows the relationship between the time complexity, success probability, and the size of bypassed bits, where \((n,m,c)=(80, 442368,\pm 2^{-36})\) is used. From Fig. 6, \(\beta = 11\) is preferable, and the time complexity is \(3 \times (80-11) \times 2^{80-11} \approx 2^{76.6935}\) and the corresponding success probability is almost 100%.

Fig. 6.
figure 6

Time complexity and success probability. FCA against Grain-v1.

7 Verifications, Observations, and Countermeasures

7.1 Experimental Verification

We verify our algorithm by applying it to a toy Grain-like cipher, where the sizes of the LFSR and NFSR are 24 bits, and \(s_{t+24}\), \(b_{t+24}\), and \(z_t\) are computed as

$$\begin{aligned} s_{t+24}&= s_t \oplus s_{t+1} \oplus s_{t+2} \oplus s_{t+7}, \\ b_{t+24}&= s_t \oplus b_t \oplus b_{t+5} \oplus b_{t+14} \oplus b_{t+20} b_{t+21} \oplus b_{t+11} b_{t+13} b_{t+15},\\ z_t&= h(s_{t+3}, s_{t+7}, s_{t+15}, s_{t+19}, b_{t+17}) \oplus \bigoplus _{j \in \{1,3,8\}} b_{t+j}, \end{aligned}$$

where the h function is as the one used in Grain-v1.

Fig. 7.
figure 7

Comparison between the theoretical and experimental estimations.

Similarly to the case of Grain-128a, \(\mathbb {T}_z\) is used by tapping linear part of the feedback polynomial of NFSR, i.e., \(\mathbb {T}_z=\{0,5,14,24\}\). Then, the sum of the key stream is

$$\begin{aligned} \bigoplus _{i \in \mathbb {T}_z} z_{t+i} = \bigoplus _{i \in \mathbb {T}_z} h(s^{(t+i)}, b^{(t+i)}) \oplus \bigoplus _{j \in \{1,3,8\}} \left( s_{t+j} + g'(b^{(t+j)}) \right) , \end{aligned}$$

where \(g'(b^{(t)}) = b_{t+20} b_{t+21} \oplus b_{t+11} b_{t+13} b_{t+15}\). The ANF of the h function involves \(b_{t+17}\), \(b_{t+22}\), \(b_{t+31}\), and \(b_{t+41}\). If \(\varLambda _{i}[4]=1\) is used for \(i \in \{0,14,24\}\), the correlation is always 0 because \(\bigoplus _{j \in \{1,3,8\}} g'(b^{(t+j)})\) does not involve \(b_{t+17}\), \(b_{t+31}\), and \(b_{t+41}\). Only \(b_{t+22}\) is involved to \(\bigoplus _{j \in \{1,3,8\}} g'(b^{(t+j)})\). Therefore, we evaluated correlations of \(\bigoplus _{j \in \{1,3,8\}} g'(b^{(t+j)})\) and \(\bigoplus _{j \in \{1,3,8\}} g'(b^{(t+j)}) \oplus b_{t+22}\), and they have the correlation \(2^{-3.41504}\). For \(i \in \{0,14,24\}\), we have 8 possible linear masks. Moreover, we should use 0101 and 1111 for the linear mask \(\varLambda _{14}[0-3]\) because the sign of the correlation is the same in either case of \(\varLambda _{14}[4]=0\) and \(\varLambda _{14}[4]=1\). As a result, we have \(8\times 8\times 8\times 2=1024\) linear masks whose absolute value of correlations is \(2 \times 2^{-8-3.41504}=2^{-10.41504}\), where the factor 2 is derived from the sum of correlations for \(\varLambda _{14}[4]=0\) and \(\varLambda _{14}[4]=1\).

For example, when \(\beta = 5\), the data complexity is \((24-5)\times 2^{24-5} \approx 2^{23.25}\). From Proposition 1, when we use \(th=6579\) as the threshold for the normal distribution, the complexities for three steps of the attack algorithm are balanced. Moreover, when we use \(th_p=9\) as the threshold for the Poisson distribution, the probability that incorrect initial state appears at least \(th_p\) times is \(2^{-26} < 2^{-24}\).

We randomly choose the initial state and repeat the attack algorithm 1000 times. Figure 7 shows the comparison of the Poisson distributions between the theoretical and experimental ones. From this figure, our experimental results almost follow the theoretical one.

7.2 Another View to Find Preferable \(\mathbb {T}_z\)

In our strategy, we first searched for \(\mathbb {T}_z\), which brings the best linear characteristic. A mixed integer linear programming (MILP) is often applied to search for the best linear characteristics of block ciphers [34, 35], and this method is naturally applied to search for the best linear characteristic of the fast correlation attack. We first generate an MILP model to represent linear trail with specific number of rounds R. Then, we maximize the probability of the linear characteristic under the condition that \(b^{(0)}\) and \(b^{(R)}\) are linearly inactive.

We used \(\mathbb {T}_z=\{0,26,56,91,96,128\}\) and \(\mathbb {T}_z=\{0,14,21,28,37,45,52,60,\) \(62,80\}\) for Grain-128a and Grain-v1, respectively, and they bring the best linear characteristic. For Grain-128a and Grain-v1, the correlation of the linear characteristic are \(\pm 2^{-80.159}\) and \(\pm 2^{-38.497}\), respectively. It is not enough to estimate the correlation only from the best characteristic because we need to take into account of the effect by multiple characteristics. For example, assuming that there are two characteristics whose absolute values of correlations are the same but their signs are different, these two characteristics cancel each other. On the other hand, if their signs are the same, we can observe double correlations. Especially, it is very interesting that Grain-128a has significant gain from the best linear characteristic. While the MILP is useful to find the best characteristic, there is no method to find multiple linear characteristics without repeating MILPs. Therefore, we used the MILP only to detect a preferable \(\mathbb {T}_z\), and the corresponding correlation is estimated as explained in Sects. 5 and 6.

7.3 Possible Countermeasure Against Our New Attack

The simplest countermeasure is to suppress the output at every second position when the key stream is output. For example, the authenticated encryption mode of Grain-128a has such structure, where the key stream is output only in the even clock. When we attack Grain-128a, we want to use \(\mathbb {T}_z=\{0,26,56,91,96,128\}\), but we cannot tap 91. As far as we search, we cannot detect a preferable \(\mathbb {T}_z\) under the condition that the tapped indices are only even numbers. On the other hand, this countermeasure leads to low throughput.

Another countermeasure would be to limit the length of the key stream for each pair of secret key and iv. It would become difficult to collect enough parity-check equations to execute the FCA. Lightweight stream ciphers often have such restriction, e.g., Plantlet outputs only \(2^{30}\)-bit key stream for each pair of secret key and iv [26]. On the other hand, the advantage of stream ciphers can keep high performance once the initialization finishes, and such restriction does not use the advantage very well.