Keywords

1 Introduction

Side-channel attacks are powerful tools to extract secret information from hardware devices, such as the cryptographic microcontrollers used in banking smartcards. These attacks apply a divide-and-conquer strategy, such that they are able to target each subkey byte of a cryptographic algorithm independently. This may allow an attacker to mount a practical side-channel attack on a block cipher such as AES, when using a key of 128 or 256 bits (16 or 32 bytes, respectively), by targeting each of the 16 or 32 key bytes independently, whereas a purely brute-force search attack on the full key is computationally infeasible.

Recent advances in side-channel attacks have focused on the problem of estimating the rank of the full key of a cryptographic algorithm, after obtaining sorted lists of probabilities for the different subkeys that compose the full key (e.g. lists for the 16 subkey bytes of AES, when used with a 128-bit key).

These recent algorithms represent very useful tools for security evaluators that need to estimate the security of a given device. The algorithm proposed by Veyrat-Charvillon et al. [7] was the first method that could estimate the rank of a full 128-bit key, albeit with a considerable error margin. More recent algorithms [11,12,13] have reduced the bounds of this estimation to within one bit for 128-bit keys and can run within seconds of computation, after being given with a list of sorted probabilities for the individual subkeys.

But none of these algorithms can scale for large keys composing of more than 256 bytes (e.g. an RSA 2048 or 4096 bit key), while at the same time providing tight bounds. Furthermore, even for smaller key sizes (e.g. 128-bit AES key), existing approaches can deviate from the actual security metric.

In this paper, we present sound mathematically-derived bounds for the guessing entropy, which allow us to evaluate the security of devices using arbitrary large keys (even 512 bytes or more). These have no memory requirements, can be computed instantaneously and provide bounds within a few bits.

2 Background: Side-Channel Attacks and Key Enumeration

Given a physical device (e.g. a smartcard) that implements a cryptographic algorithm, such as AES, we may record side-channel traces (power consumption or electromagnetic emissions) using an oscilloscope. In this case, for each encryption of a plaintext \(p_i\) with a key \(k\star \), we can obtain a leakage trace \(\mathbf {x}_i\) that contains some information about the encryption operation.

For the particular case of AES and other similar block ciphers that use a substitution box (S-Box), a common target for side-channel attacks is the S-box operation \(v = \text {S-box}(k\star \oplus p)\) from the first round of the block cipher. Since this operation is done for each subkey \(k\star \) in part (for AES each subkey only has 8 bits), we can attack each of the subkeys separately. And by using information from the leakage traces, a side-channel attack such as DPA [1], CPA [3] or Template Attacks [2] can assign higher probabilities to the correct subkeys, leading to a very powerful brute-force search on each subkey.

After obtaining the lists of probabilities for each subkey, we may need to combine these lists in some way in order to determine what are the most likely values for the full cryptographic key. One important motivation for this is that secure devices, such as the microcontrollers used in EMV cards, need to obtain a Common Criteria certification at some assurance level (e.g. EAL4+). To provide such certification, evaluation laboratories may need to verify the security of devices against side-channel attacks also for the case of full-key recovery attacks, in particular where some subkeys may leak considerably different than others.

For the particular case of AES, we need to combine from 16 bytes (128-bit key) to 32 bytes (256-bit key). If the target device leaks enough information and sufficient measurements are done, then the attack may provide a probability close to one for the correct subkey value, while assigning a very small probability to the other candidate subkey values. In this case, the combination is trivial, as we only need to use the most likely value for each subkey. However, in practice, due to noise in the measurements and various security measures in secured devices, the correct value of each subkey may be ranked anywhere between the first and the last position. In this case, a trivial direct combination of all the lists of probabilities is not computationally feasible. Note that this problem arises in any scenario where we need to combine multiple lists of probabilities, not just in the case of AES, as we shall show below.

To deal with this combination problem in the context of side-channel attacks, two kinds of combination algorithms have emerged in recent years: key enumeration and rank estimation algorithms. Key enumeration algorithms [5, 14] provide a method to output full keys in decreasing order of likelihood, such that we can minimize the number of keys we try until finding the correct one (which is typically verified by comparing the encryption of a known plaintext/ciphertext pair).

The other kind of algorithms, which are directly related to our paper, are the rank estimation algorithms. These algorithms provide an estimate of the full key rank, i.e. the number of keys we should try until finding the correct one if we were to apply a similar approach to key enumeration. The great advantage of rank estimation algorithms is that we can estimate the key rank even if this rank is very high (e.g. \(2^{80}\) or larger), whereas enumerating such large number of keys is computationally infeasible. For security evaluations, this was until now probably the most convenient tool, since it can quickly estimate the security of a device. However, it is important that these rank estimation algorithms provide some guarantee of their bounds, since otherwise their output can be misleading.

Veyrat-Charvillon et al. [7] proposed the first efficient rank estimation algorithm for 128-bit keys, which could run in between 5 and 900 s. The main drawbacks of this algorithm are that the bounds of the rank estimation can be up to 20–30 bits apart from the real key rank and the required time to tighten the bounds increases exponentially. More recently, new algorithms [11,12,13] have improved the speed and tightness of the rank estimation. Among these, the histogram-based approach of Glowacz et al. [11] is probably the fastest and scales well even up to keys composed of 128 bytes (e.g. 1024-bit RSA key).

Nevertheless, none of these recent algorithms can scale efficiently to larger cryptographic keys, e.g. 2048-bit (256 bytes) or 4096-bit (512 bytes) keys, such as common RSA keys used for public key encryption. We have tested the C implementation of Glowacz et al. [11] on 256 subkey bytes and it took about 64 seconds per iteration (using the default N = 2048 bins and the merge parameter set to two, i.e. doing a pre-computation step where lists of subkey probabilities are first combined two by two; for merge = 3 the memory requirements killed the program), while for 512 subkey bytes (\(\text {merge}=2\)) the memory requirements killed again the programFootnote 1. The algorithm of Martin et al. [13] is also prohibitive for large keys, since it runs in \(O(m^2 n \log n)\), where m is the number of subkeys and n is the number of possible values per subkey. Similarly, the PRO algorithm of Bernstein et al. [12] (which is the fastest of the two proposed by the authors) took about 5 h for 256 subkey bytes and made the evaluation platform run out of swap memory (according to their results).

In contrast, our methods presented in the following sections allow us to obtain tight bounds instantaneously for arbitrarily large keysFootnote 2. This is the first fully scalable security evaluation method proposed to date.

A possible scenario where such scalable methods are required, is the evaluation of side-channel attacks against the key loading operation. That is, side-channel attacks which target the transfer of keys from memory to registers, rather than the cryptographic algorithm itself. This was the case for example in the attacks of Oswald and Paar against the commercial Mifare DESFire MF3ICD40 [6] or the attacks of Choudary and Kuhn [8] against the AVR XMEGA. Recent secure devices, such as the A7101CGTK2: Secure authentication microcontroller [23] support RSA encryptions with keys up to 4096 bits (512 bytes). Hence, in order to evaluate the security of these devices against full-key recovery side-channel attacks during the key loading operation, we need scalable rank estimation algorithms.

Furthermore, our methods are generally applicable, so they can be used in any other scenario where probability lists need to be combined to determine the approximate security of some system.

3 Experimental Data

In order to present and demonstrate our results, we used two distinct datasets, one from a hardware AES implementation and the other from MATLAB simulated data. The first dataset consists of \(2^{20} \approx 1\) M power-supply traces of the AES engine inside an AVR XMEGA microcontroller, obtained while the cryptographic engine was encrypting different uniformly distributed plaintexts. The traces correspond to the S-box lookup from the first round key. Each trace contains \({m}^{\mathrm {}}=5000\) oscilloscope samples recorded at 500 MS/s, using a Tektronix TDS7054 oscilloscope, configured at 250 MHz bandwidth in HIRES mode with Fastframe and 10 mV/div vertical resolution, using DC coupling. The XMEGA microcontroller was powered at 3.3 V from batteries and was run by a 2 MHz sinewave clock. We shall refer to this as the real dataset.

The second dataset consists of simulated data, generated using MATLAB. The data contains unidimensional leakage samples \({\mathbf {x}}_{i}\) produced as the hamming weight of the AES S-box output value mixed with Gaussian noise, i.e.

$$\begin{aligned} {\mathbf {x}}_{i}= \text {HW}(\text {S-box}(k\oplus p_i)) + r_i, \end{aligned}$$
(1)

where \(p_i\) is the plaintext byte corresponding to this trace, and \(r_i\) represents the Gaussian noise (variance 10). We shall refer to this as the simulated dataset.

3.1 Template Attacks

To use our datasets with the methods evaluated in this paper, we need to obtain lists of probabilities for the possible values of the 16 subkeys used with our AES implementations. To do this we use template attacks (TA) [2, 8] on each subkey during the S-box lookup of the first AES round.Footnote 3

After executing a side-channel attack using a vector \(\mathbf {X}\) of leakage traces (e.g. the real or simulated traces in our case), we obtain a vector of scores or probabilities \(\mathbf {\mathrm {d}}(k| \mathbf {X}) \in \mathbb {R}^{|\mathcal {S}|}\) for each possible key byte value \(k\in \{1, \ldots , |\mathcal {S}|\}\), where \(|\mathcal {S}|\) is the number of possible values (typically \(|\mathcal {S}|=256\) for one AES subkey byte). In the case of template attacks we obtain real probabilities and we shall often write \( P(k| \mathbf {X}) = \mathbf {\mathrm {d}}(k| \mathbf {X}) \).Footnote 4

After obtaining the probabilities \(P_i(k| \mathbf {X})\) for each subkey byte i, we can compute the security metrics and rank estimation methods presented below.

4 Security Metrics

To evaluate the security of a device against different side-channel attacks, an evaluator will typically use some evaluation metric. Standaert et al. [4] presented several such metrics for the case of attacks that target a single subkey at a time. Among these, we present below the guessing entropy and the conditional entropy. Afterwards we show how to derive scalable and tight bounds for these metrics. These allow us to obtain very efficient methods for estimating the security of devices against full-key recovery side-channel attacks.

4.1 Guessing Entropy

In 1994, James L. Massey proposed a metric [16], known as the Guessing Entropy, to measure the average number of guesses that an attacker needs to find a secret after a cryptanalytic attack (such as our side-channel attacks).

Given the probability vectors \(P(k| \mathbf {X})\) for each subkey obtained after a side channel attack, we can compute Massey’s guessing entropy as follows. First, sort all the probability values \(P(k| \mathbf {X})\), obtaining the sorted probability vector \(\mathbf {p} = \{p_1, p_2, \dots , p_{|\mathcal {S}|}\}\), where \(p_1 = \max _k P(k| \mathbf {X})\), \(p_2\) is the second largest probability and so on. Then, compute Massey’s guessing entropy (\(\text {GM}\)) as:

$$\begin{aligned} \text {GM}= \sum _{i=1}^{|\mathcal {S}|} i \cdot p_i. \end{aligned}$$
(2)

Massey’s guessing entropy represents the statistical expectation of the position of the correct key in the sorted vector of conditional probabilities. A similar measure is the actual guessing entropy (\(\text {GE}\)) [4], which provides the position of the correct key in the sorted vector of conditional probabilities. The \(\text {GE}\) is computed as follows: given the vector of sorted probabilities (or scores) \(\mathbf {p} = \{p_1, p_2, \dots , p_{|\mathcal {S}|}\}\), return the position of the probability corresponding to the correct key \(k\star \) Footnote 5:

$$\begin{aligned} \text {GE}= i, \quad p_i = P(k\star | \mathbf {X}). \end{aligned}$$
(3)

As we can see from their definitions, both measures are computed from the posteriori probabilities of the keys given a set of leakage traces, but the GM computes the expected position of the correct key, while the GE computes the actual position of the correct key. For this reason, the GE requires knowledge of the correct key, while the GM does not. Furthermore, we can see that averaging the GE over many experiments we approximate precisely the GM. Therefore, if we had exact probabilities, the GM would be the expected value of the GE.

In terms of usage, the GE is the most used measure in the side-channel evaluations published so far, mainly because it represents the actual position of the correct key and also because it can be computed even when using score-based attacks which do not output probabilities for each key (e.g. by sorting the keys according to their correlation after a correlation power attack and selecting the position of the correct key).

However, if we can obtain good probabilities for the key candidates (e.g. by using template attacks), then the GM can be a better evaluation tool, because as we said, the GM represents the expected value of the GE, but also because it is less affected by minor differences between probabilities. That is, when \(p_1\) is much larger than the other probabilities, both measures will return 1 (or close to 1). On the other hand, for all scenarios in which the key is not easy to detect and the probabilities \(p_1, p_2, \ldots , p_{|\mathcal {S}|}\) are very close to each other, any minor variation in the probabilties (e.g. due to measurement errors) will lead to possibly large variations of GE, while GM will provide the correct result, i.e. the expected value (which should be around \((|\mathcal {S}|+1)/2\) if all the probabilities are very close).

Furthermore, the GM will allow us to derive the fast, scalable and tight bounds that we present in the following sections.

In our results, we shall show the logarithm (base 2) of the guessing entropy.

4.2 Conditional Entropy

In information theory, the mutual information \(I(X, Y)\) between two random variables \(X\) and \(Y\) is defined as:

$$\begin{aligned} I(X, Y) = H(X) - H(X | Y), \end{aligned}$$
(4)

where

$$\begin{aligned} H(X) = -\mathbb {E} \log _2 P(X) = -\sum _{x \in \mathcal {X}} P(x) \cdot \log _2 P(x) \end{aligned}$$
(5)

represents Shannon’s entropy for the random variable \(X\), and

$$\begin{aligned} H(X | Y) = \sum _{y \in \mathcal {Y}} P(y) H(X | Y = y) = -\sum _{y \in \mathcal {Y}} P(y) \sum _{x \in \mathcal {X}} P(x | y) \cdot \log _2 P(x | y) \end{aligned}$$
(6)

represents the conditional entropy of \(X\) given \(Y\). In short, the conditional entropy shows how much entropy (uncertainty) remains about the variable \(X\) when the variable \(Y\) is given.

As before, we are interested in knowing how much uncertainty (entropy) remains about the random variable \(K\) (representing the secret key byte \(k\)), when a set of leakage traces (represented by the variable \(L\)) is given; this can be quantified using the conditional entropy defined above. If \(K\) represents one key byte, as in our setup, then \(H(K) = 8\).Footnote 6 Using this notation we obtain the conditional entropy

$$\begin{aligned} H(K | L) = \sum _{\mathbf {X}\in \mathcal {L}} P(\mathbf {X}) H(K | L = \mathbf {X}) = -\sum _{\mathbf {X}\in \mathcal {L}} P(\mathbf {X}) \sum _{k \in \mathcal {K}} P(k | \mathbf {X}) \cdot \log _2 P(k | \mathbf {X}). \end{aligned}$$
(7)

In practice, we can compute the conditional entropy from (7) using one of the following optionsFootnote 7:

  1. 1.

    Compute an integral over the full leakage space, leading to the computationally-intensive form:

    $$\begin{aligned} H(K | L) = -\int _{\mathbf {X}\in \mathcal {L}} P(\mathbf {X}) \sum _{k \in \mathcal {K}} P(k | \mathbf {X}) \cdot \log _2 P(k | \mathbf {X}) d\mathbf {X}. \end{aligned}$$
    (8)
  2. 2.

    Use Monte Carlo sampling from a limited subset of \(N\) traces:

    $$\begin{aligned} H(K | L) = -\frac{1}{N} \sum _{i=1}^N\sum _{k \in \mathcal {K}} P(k | \mathbf {X}_i) \cdot \log _2 P(k | \mathbf {X}_i). \end{aligned}$$
    (9)

The first form is computationally intensive, as for multi-dimensional leakage traces the integral in (8) needs to be computed over a multi-dimensional space. Therefore, in our experiments we used the second form, where N is the number of iterations (usually \(N=100\)) over which we computed the second summation and the probabilities \(P(k | \mathbf {X}_i)\) were obtained from template attacks on each iteration.

5 Tight Bounds for Guessing Entropy

In this section, we explain how to adapt several known bounds (lower and upper) of the guessing entropy (\(\text {GM}\)) in the context of side-channel attacks, when we deal with a single list of probabilities (e.g. targeting a single subkey byte). These bounds can be used as a fast approximation of the \(\text {GM}\) (since they run in linear time, because they don’t require the sorting operation that is necessary for the computation of \(\text {GM}\)), but their great advantage is in the context of multiple lists of probabilities (see next section).

5.1 Bounds for Massey’s Guessing Entropy from Probabilities

Arikan [19] presented a lower and an upper bound for \(\text {GM}\). We can adapt these bounds to our side-channel context using the notation from previous sections, as follows:

$$\begin{aligned} \frac{1}{1+\ln |\mathcal {S}|} \left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2 \le \text {GM}\le \left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2, \end{aligned}$$
(10)

with the important remark that in the lower and upper bounds the individual probabilities \(p_k = P(k|X)\) do not need to be sorted. This means that both bounds can be computed in \(O(|\mathcal {S}|)\). The upper bound of Arikan has been improved in [18], by Theorem 3, yielding:

$$ \text {GM}\le \frac{1}{2}\left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2+\frac{1}{2} \le \left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2. $$

Combining this with (10), we obtain the tighter relation:

$$\begin{aligned} \frac{1}{1+\ln |\mathcal {S}|} \left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2 \le \text {GM}\le \frac{1}{2}\left[ \sum _{k=1}^{|\mathcal {S}|}{p_k^{1/2}}\right] ^2+\frac{1}{2}. \end{aligned}$$
(11)

We shall refer to these lower and upper bounds as \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\), respectively.

Fig. 1.
figure 1

\(\text {GM}\), \(\text {GE}\) and \(\text {GM}\) bounds from probabilities for the simulated (left) and real (right) datasets, when targeting a single subkey byte. These are averaged results over 100 experiments.

We show the results of using these bounds on the simulated (left) and real (right) datasets in Fig. 1. We can make several observations. Firstly, the bounds are in both cases within 1–2 bits apartFootnote 8 for all values of the guessing entropy. Secondly, we see that for the simulated dataset the \(\text {GM}\) is very close to the \(\text {GE}\), but for the real dataset the \(\text {GE}\) deviates considerably and even goes outside the upper bound of the \(\text {GM}\). In all our experiments, we observed that the \(\text {GM}\) stays either close or below the \(\text {GE}\). This can be explained by the fact that even if many probabilities are close to each other in value, small ordering errors can have a higher impact on \(\text {GE}\) (which only depends on the order) than on \(\text {GM}\) (which only depends on the probability values).

As we shall show later, previous rank estimation algorithms, such as the one of Glowacz et al. [11], also tend to follow more the \(\text {GM}\) rather than the \(\text {GE}\), because they also rely on the values of probabilities rather than the exact position of the correct key, even though such algorithms also use the value of the correct key in order to position their bounds closer to the actual position of the correct key. Nevertheless, both measures can be useful. If we need the exact position of the key for a particular set of measurements, then \(\text {GE}\) is the best tool. However, the \(\text {GE}\) cannot be computed for large number of target subkeys and is also subject to the particular measurements, i.e. noise can cause the correct subkey value to be ranked very bad, even though its probability is very close to those in the top, leading to a very high \(\text {GE}\), while the \(\text {GM}\) will show a lower value. Hence, in such scenario the \(\text {GM}\) may actually provide a better intuition since with a new set of traces (e.g. the attacker), the correct subkey value could be ranked better, leading to a smaller \(\text {GE}\). Furthermore, the fact that the \(\text {GM}\) will in general be below the \(\text {GE}\) (or very close to it in case it is slightly above) means that relying on the \(\text {GM}\) will provide a safer conclusion from a designer perspective. That is, if the resulting \(\text {GM}\) is above a desired security level for some scenario, then we can be confident enough that the \(\text {GE}\) will either be very close or above.

From the figure, we also see that \(\text {GM}\) always stays within the bounds. This is guaranteed, given the mathematical derivation. And as we shall see, the algorithmic approaches can introduce estimation errors and provide erroneous results that are neither between our bounds nor close to the expected \(\text {GE}\).

Besides the above differences between \(\text {GM}\) and \(\text {GE}\), what is most important in our context, is that we can obtain very fast and scalable bounds for \(\text {GM}\).

Finally, we mention another important difference between \(\text {GM}\) and \(\text {GE}\), namely for the computation of \(\text {GE}\) we need knowledge of the real key (so we can compute its position), while for the \(\text {GM}\) we do not. Hence, our \(\text {GM}\) bounds allow anyone to estimate the security of a device, while previous rank estimations could only be used by evaluators having access to the real key (target) values.

5.2 Bounds from Conditional Entropy

We now show how to bound Massey’s guessing entropy as a function of the conditional entropy, using Massey’s inequality [16] and McEliece and Yu inequality [17]. This allows us to obtain a general relation between the guessing entropy and the conditional entropy in the context of side-channel attacks.

Let \(H(K|L=\mathbf {X})\) be the conditional entropy obtained for the set of leakage traces \(\mathbf {X}\). Applying Massey’s inequality [16] to \(\text {GM}\) and \(H(K|L=\mathbf {X})\), we obtain the following upper bound for the conditional entropy:

$$\begin{aligned} 2+\log (\text {GM}-1)\ge H(K|L=\mathbf {X}). \end{aligned}$$
(12)

Then, applying McEliece and Yu’s inequality [17], we obtain a lower bound for the conditional entropy as:

$$\begin{aligned} H(K|L=\mathbf {X})\ge \frac{2\log |\mathcal {S}|}{|\mathcal {S}|-1}(\text {GM}-1). \end{aligned}$$
(13)

Using (12) and (13), we obtain lower and upper bounds for \(\text {GM}\) as a function of the conditional entropy:

$$\begin{aligned} 2^{H(K|L=\mathbf {X})-2} + 1 \le \text {GM}\le \frac{|\mathcal {S}|-1}{2\log |\mathcal {S}|}H(K|L=\mathbf {X}) + 1. \end{aligned}$$
(14)

We refer to these as \(\text {LB}_{\text {GMHK}}\) and \(\text {UB}_{\text {GMHK}}\), respectively.

Remark 1

The left inequality in (14) is true when \(H(K|L=\mathbf {X})\) is greater than 2 bits.

Fig. 2.
figure 2

\(\text {GM}\), \(\text {GE}\) and \(\text {GM}\) bounds from probabilities and conditional entropy H(K|L) for the simulated (left) and real (right) datasets, when targeting a single subkey byte. These are averaged results over 100 experiments.

We show these bounds in Fig. 2, along with the previous bounds, for both the simulated (left) and real (right) datasets. We can see that in both cases the lower bound \(\text {LB}_{\text {GMHK}}\) stays within 1 bit of \(\text {GM}\) for all values of \(\text {GM}\), while the upper bound \(\text {UB}_{\text {GMHK}}\) deviates substantially, even more than 3 bits from the \(\text {GM}\). Secondly, in both results we see that \(\text {UB}_{\text {GM}}\) is a much better upper bound than \(\text {UB}_{\text {GMHK}}\). We observed this in all our experiments. Combining the best of these bounds, we can say that for the lower bound we should use the maximum between \(\text {LB}_{\text {GMHK}}\) and \(\text {LB}_{\text {GM}}\), while for the upper bound we should use \(\text {UB}_{\text {GM}}\).

6 Impressive Scaling: Scalable Bounds for Guessing Entropy

We now show how to scale the bounds presented in the previous section to arbitrarily many lists of probabilities, so they can be used to estimate the security of a full AES key (16–32 subkey bytes) or even RSA key (128–512 subkey bytes), while being computable in time that increases linearly with the number of subkeys targeted.

In the following, we shall use the notation \(\text {GM}^f\) to refer to the \(\text {GM}\) for the full key, \(n_{\mathrm {s}}\) for the number of subkeys composing the full target key, and \({|\mathcal {S}|}^{n_{\mathrm {s}}}\) for the number of possible full key values (e.g. \(n_{\mathrm {s}}=16, {|\mathcal {S}|}^{n_{\mathrm {s}}} = 2^{128}\) for AES-128).

6.1 Using Bounds of \(\text {GM}\) for Evaluation of Full Key

In Sect. 5.1, we showed how to derive tight bounds for \(\text {GM}\) from probabilities in the case of a single subkey byte. Considering the shape of the summation involved in (11), we need a way to avoid the computation of all the possible probabilities in the set of cross-probabilities from the full key space. Splitting the full sum into groups of partial sums leads to our main result:

Theorem 1

(\(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) for full key). Let \(p_1^i,p_2^i,...,p_{|\mathcal {S}|}^i\) be the probabilities for the \(i=1,2,...,n_s\) target subkey. Then we have

$$ \frac{1}{1+\ln |\mathcal {S}|^{n_{\mathrm {s}}}}\prod _{i=1}^{n_{\mathrm {s}}} \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{i}}}\right] ^2 \le {\text {GM}}^f\le \frac{1}{2}\prod _{i=1}^{n_{\mathrm {s}}} \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{i}}}\right] ^2 +\frac{1}{2}. $$

Proof

Considering the \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) bounds for the full key, we have

$$ \frac{1}{1+\ln {|\mathcal {S}|^{n_{\mathrm {s}}}}} \left[ \sum _{k=1}^{|\mathcal {S}|^{n_{\mathrm {s}}}}{\sqrt{p_k^f}}\right] ^2 \le {\text {GM}}^f\le \frac{1}{2}\left[ \sum _{k=1}^{|\mathcal {S}|^{n_{\mathrm {s}}}}{\sqrt{p_k^f}}\right] ^2+\frac{1}{2}. $$

Then, adding the fact that the new probabilities are combined as a product of \(n_{\mathrm {s}}\) probabilities from target subkeys, i.e. \(p_k^f=\prod _{i=1}^{n_{\mathrm {s}}} p_j^i\), with \(j=j(k,i)\in \{1,2,...,|\mathcal {S}|\}\) and factoring accordingly, we obtain that

$$ \sum _{k=1}^{|\mathcal {S}|^{n_{\mathrm {s}}}}{\sqrt{p_k^f}}= \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{1}}}\right] \cdot \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{2}}}\right] \cdot \ldots \cdot \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{n_{\mathrm {s}}}}}\right] $$

i.e.

$$ \sum _{k=1}^{|\mathcal {S}|^{n_{\mathrm {s}}}}{\sqrt{p_k^f}}=\prod _{i=1}^{n_{\mathrm {s}}} \left[ \sum _{k=1}^{|\mathcal {S}|}{\sqrt{p_k^{i}}}\right] $$

and we are done.

\(\text {UB}_{\text {GM}}\) runs in \(O(|\mathcal {S}|)\) and for full key runs in \(O(n_{\mathrm {s}}\cdot |\mathcal {S}|)\), i.e. the computation time only increases linearly with the number of subkey bytes.

Remark 2

We can estimate the number of bits \(\delta \) between our \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) bounds for the full key as \(\delta = \log 2(\text {UB}_{\text {GM}}) - \log 2(\text {LB}_{\text {GM}}) = \log 2(\text {UB}_{\text {GM}}/\text {LB}_{\text {GM}})\). Ignoring the 1/2 factor (which is negligible as the number of subkeys increases), we obtain the following approximation:

$$ \delta \approx \log 2 \left( \frac{1+\ln {|\mathcal {S}|^{n_{\mathrm {s}}}}}{2} \right) = \log 2 \left( \frac{1 + n_{\mathrm {s}}\cdot \ln {|\mathcal {S}|}}{2} \right) \ \text {bits}. $$

6.2 Using Bounds of H(K|L) for Evaluation of Full Key

Assuming independence between target subkeys and considering the bounds presented into (14) applied for the full key space yields

Theorem 2

(\(\text {LB}_{\text {GMHK}}\) and \(\text {UB}_{\text {GMHK}}\) for full key). Let \(H(K|L=\mathbf {X}_i)\) be the conditional entropy for the \(i=1,2,...,n_s\) target subkey, then

$$ 2^{\sum _{i=1}^{n_s}H(K|L=\mathbf {X}_i)-2} + 1 \le {\text {GM}}^f \le \frac{|\mathcal {S}|^{n_s}-1}{2\log |\mathcal {S}|^{n_s}}\sum _{i=1}^{n_s}H(K|L=\mathbf {X}_i) + 1. $$

Proof

Considering (14) applied for full key space yields

$$ 2^{H(K^f|L^f=\mathbf {X})-2} + 1 \le {\text {GM}}^f \le \frac{|\mathcal {S}|^{n_s}-1}{2\log |\mathcal {S}|^{n_s}}H(K^f|L^f=\mathbf {X}) + 1, $$

where \(H(K^f|L^f=\mathbf {X})\) is the joint conditional entropy for all \(n_s\) target subkeys. And because of the assumed independence between target subkeys, yields from [20, Theorem 2.6.6] that

$$H(K^f|L^f=\mathbf {X})=\sum _{i=1}^{n_s}H(K|L=\mathbf {X}_i),$$

which gives us the wanted result.

Again, both bounds \(\text {LB}_{\text {GMHK}}\) and \(\text {UB}_{\text {GMHK}}\) run in \(O(|\mathcal {S}|)\) and for full key in \(O(n_{\mathrm {s}}\cdot |\mathcal {S}|)\), i.e. they both scale linearly with the number of target subkeys.

Fig. 3.
figure 3

\(\text {GM}\), \(\text {GE}\) and \(\text {GM}\) bounds for the simulated (left) and real (right) datasets, when targeting two subkey bytes. These are averaged results over 100 experiments.

In Fig. 3, we show our scaled bounds for \({\text {GM}}^f\) for the case of targeting two subkey bytes. We computed both \({\text {GM}}^f\) and \({\text {GE}}^f\) by first obtaining the cross-product of probabilities between the first two subkeys in the datasets. We see again that our bounds are correct for \({\text {GM}}^f\), while \({\text {GE}}^f\) deviates slightly from \({\text {GM}}^f\) for the real dataset (refer to Sects. 4.1 and 5.1 for an explanation). \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) stay within about 2 bits in both the simulated and real experiments from Fig. 3. \(\text {UB}_{\text {GMHK}}\) stays again far from \({\text {GM}}^f\), but \(\text {LB}_{\text {GMHK}}\) is tighter than \(\text {LB}_{\text {GM}}\) for higher values of \({\text {GM}}^f\), as we saw also in the case of a single subkey byte. This confirms that for the lower bound we should use the maximum from \(\text {LB}_{\text {GM}}\) and \(\text {LB}_{\text {GMHK}}\), while for the upper bound we should use \(\text {UB}_{\text {GM}}\).

6.3 \(\text {GM}\) Bounds from Element Positioning

Considering the computational advantage of working with scalable bounds for \(\text {GM}\), in [21, 22], based on an inequality related to positioning an element into a sorted matrix, the authors present new scalable bounds for \(\text {GM}\) as follows:

$$ \prod _{i=1}^{n_{\mathrm {s}}}{\text {GM}}_i\le {\text {GM}}^f\le |\mathcal {S}|^{n_{\mathrm {s}}}-\prod _{i=1}^{n_{\mathrm {s}}}\left( |\mathcal {S}|-{\text {GM}}_i\right) , $$

where \({\text {GM}}_i\) is the guessing entropy of the \(i=1,2,...,n_s\) target subkey.

In order to answer the authors, which left the improvement of these bound as open question, we accepted the challenge and refined both bounds. But because we observed (see Fig. 7 in Appendix) that these improved bounds are still much weaker than the \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) bounds, we leave the results and proofs of this part in Appendix.

Fig. 4.
figure 4

\(\text {GM}\), \(\text {GE}\) \(\text {GM}\) bounds and FSE15 bounds for the simulated (left) and real (right) datasets, when targeting two subkey bytes. These are averaged results over 100 experiments.

Fig. 5.
figure 5

\(\text {GM}\) bounds and FSE15 bounds for the simulated (left) and real (right) datasets, when targeting 16 subkey bytes. These are averaged results over 100 experiments.

6.4 \(\text {GM}\) Bounds Versus the FSE 2015 Rank Estimation

As mentioned in Sect. 2, several algorithms [7, 11,12,13] have been proposed in recent years to estimate the rank of the correct full key. Among them, the rank estimation of Glowacz et al. [11], to which we shall refer as FSE15 from now on, is probably the fastest and scales well for keys up to 128 bytes (e.g. 1024-bit RSA key). For this reason, in Fig. 4, we compare our \(\text {GM}\) bounds to the results of FSE15 (using their C implementation) for the case of two subkeys, for both the simulated (left) and real (right) datasets. The results show that the FSE15 bounds generally stay within our \(\text {GM}\) bounds in both data sets, but for the real data set they go slightly beyond our bounds, following the \({\text {GE}}^f\).

In Fig. 5, we compare our \(\text {GM}\) bounds and the FSE15 bounds for the full 16-byte AES key, again for the simulated (left) and real (right) datasets. From this figure, we see that our \(\text {GM}\) bounds are tight even for the full 128-bit AES key (16 subkeys), \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) staying within 5 bits of each other in both experiments. From the experiments on the real dataset, we also see that \(\text {LB}_{\text {GMHK}}\) fails once the guessing entropy decreases below 70 bits, due to numerical limitationsFootnote 9 when computing the bound at this point. Comparing our bounds to the FSE15 bounds in the simulated data set, we can see that for higher values of \({\text {GM}}^f\), the FSE15 bounds stay within our \(\text {GM}\) bounds, but afterwards they start to deviate, due to the deviation of \({\text {GE}}^f\) from \({\text {GM}}^f\). A similar pattern is observed with the real data set.

From these experiments, we can see that the FSE15 bounds follow the \(\text {GE}\), while our GM bounds follow the \(\text {GM}\), and in general the FSE15 bounds stay within our GM bounds, due to the \(\text {GE}\) being close to the \(\text {GM}\). Depending on the requirements, one may prefer to use one tool or the other. However, while less tight than the FSE15 bounds, our GM bounds have the advantage of being scalable to arbitrarily large number of subkeys, while any of the previous rank estimation algorithms, including the FSE15 bounds are limited due to memory and computation time to some maximum size.

Table 1. Comparing \(\text {GM}\) bounds with rank estimation algorithms.

6.5 \(\text {GM}\) Bounds Versus Rank Estimation Algorithms

Given the development of several rank estimation algorithms in the recent years [7, 9,10,11,12,13, 21], we provide in Table 1 a comparison of these algorithms with our \(\text {GM}\) bounds in terms of computation time, memory requirements, tightness and accuracy for different key sizes.

7 Conclusion

In this paper we have presented the first fully scalable, tight, fast and sound method for estimating the guessing entropy from arbitrarily many lists of probabilities. This method, based on mathematically-derived bounds, allows us to estimate within a few bits the guessing entropy for a 128-bit key, but can also be used to estimate the guessing entropy for cryptographic keys of 1024 bytes (8192 bits) and much larger, which is not possible with any of the previous rank estimation algorithms due to memory or running time limitations.

As an illustration of this capability, we show in Fig. 6, the computation of our bounds for a 1024-byte (8192-bit) keyFootnote 10. For simplicity and easier reproducibility, we have used the simulated dataset, where we have replicated the 16 lists of probabilities (one for each target subkey byte) 64 times, so we get 1024 lists of probabilitiesFootnote 11. The plot shows our \(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) bounds for this case. Note in the right side, that the margin between our bounds is about 11.5 bits, which is expected from Remark 2. We leave this figure as a reference for future methods, as none of the previous ones could be used to obtain this plot.

Fig. 6.
figure 6

\(\text {LB}_{\text {GM}}\) and \(\text {UB}_{\text {GM}}\) bounds for a 1024-byte (8192-bit) key, computed from 1024 lists of probabilities. We used a logarithmic Y-axis, as in the rest of the figures. On the right, we show a zoom for \(n_{\mathrm {a}}=1\) and \(n_{\mathrm {a}}=2\) attack traces only.