Attacking Embedded ECC Implementations Through cmov Side Channels

Nascimento, Erick; Chmielewski, Łukasz; Oswald, David; Schwabe, Peter

doi:10.1007/978-3-319-69453-5_6

Attacking Embedded ECC Implementations Through cmov Side Channels

Erick Nascimento¹⁵,
Łukasz Chmielewski¹⁶,
David Oswald¹⁷ &
…
Peter Schwabe¹⁸

Conference paper
First Online: 20 October 2017

1932 Accesses
23 Citations

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10532))

Abstract

Side-channel attacks against implementations of elliptic-curve cryptography have been extensively studied in the literature and a large tool-set of countermeasures is available to thwart different attacks in different contexts. The current state of the art in attacks and countermeasures is nicely summarized in multiple survey papers, the most recent one by Danger et al. [21]. However, any combination of those countermeasures is ineffective against attacks that require only a single trace and directly target a conditional move (cmov) – an operation that is at the very foundation of all scalar-multiplication algorithms. This operation can either be implemented through arithmetic operations on registers or through various different approaches that all boil down to loading from or storing to a secret address. In this paper we demonstrate that such an attack is indeed possible for ECC software running on AVR ATmega microcontrollers, using a protected version of the popular $\mu $NaCl library as an example. For the targeted implementations, we are able to recover 99.6% of the key bits for the arithmetic approach and 95.3% of the key bits for the approach based on secret addresses, with confidence levels 76.1% and 78.8%, respectively. All publicly available ECC software for the AVR that we are aware of uses one of the two approaches and is thus in principle vulnerable to our attack.

E. Nascimento—The author was supported by the Brazilian National Council for Scientific and Technological Development (CNPq), under the Science Without Borders program, process 206508/2014-0. This work was done while the author was visiting Radboud University.

P. Schwabe—This work was supported by the Netherlands Organisation for Scientific Research (NWO) through Veni 2013 project 13114. Permanent ID of this document: bb3c834d7cc8ffbe7e7520f1c21bd408. Date: July 18, 2016.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

For many years, efficient software implementations of cryptographic algorithms for constrained embedded processors were mainly restricted to symmetric ciphers. However, in recent years, various libraries for elliptic curve cryptography (ECC) have been published that offer acceptable runtime and code size also on microcontrollers with very limited computational resources, e.g., the 8-bit AVR ATmega series of processors. Notable examples for these ECC implementations are summarized in Table 1.

Table 1. Overview of ECC implementations for embedded AVR processors.

Full size table

Due to the fact that an adversary often has physical access to an embedded device performing ECC operations, implementation attacks and in particular side-channel analysis (SCA) are severe threats in this scenario. Consequently, several libraries comprise countermeasures against SCA, for example, by performing computations in constant-time, or by using randomized projective coordinates. The protected implementations are further detailed in Table 1.

Many common SCA countermeasures assume that the adversary needs access to multiple traces (with identical scalar) to recover the secret key, which inherently protects protocols with ephemeral scalars. In this paper, we challenge this assumption and target fundamental building blocks of any ECC implementation, namely conditional moves and loads/stores from/to secret memory addresses. We show that template attacks allow to recover most of the secret scalar with a single trace of elliptic-curve scalar multiplication (ECSM) in both cases, which in turn renders all currently published ECC implementations for the AVR (and likely other, similar architectures) insecure.

Note that although this paper focuses on implementations of ECC, our attacks also apply to exponentiation algorithms as used in, e.g., RSA, classical Diffie-Hellman, DSA, or ElGamal. We actually expect the attacks to work even better there, because group elements are larger and thus require more loads (or conditional moves). We leave this investigation for future work.

Related work. Carefully combining countermeasures like uniformity of modular operations, (re-)randomization of the projective representation of points, scalar blinding, point blinding, and random field (or curve) isomorphisms prevent classical side-channel attacks like timing [38], SPA [20], DPA [39], CPA [11] or collision attacks [25, 31]. These attacks require a fixed scalar for multiple measured power or electromagnetic traces. The main protection relies on the full randomization of intermediate data, including input point, scalar and group, during the execution of an ECSM [4, 19, 24]. In this work we consider implementations based on the Montgomery ladder algorithm, protected by scalar randomization (SR) and projective-coordinate randomization^{Footnote 1}.

To overcome the aforementioned countermeasures two kinds of attacks have emerged: template and horizontal attacks. Although in general template attacks [14] can be used to attack multiple traces that share the same scalar, we need to attack ECSM traces independently, because of the SR. Template attacks combine statistical modeling and power-analysis, and consist of two phases. In the first phase, called profiling, the attacker builds templates by executing a sequence of instructions using a fixed scalar (with SR turned off). The second phase is called matching, in which the attacker matches the templates to attacked single traces (with SR turned on). The assumption is that the attacker possesses a profiling device, in order to build templates, that behaves the same as the target device, and runs the same implementation.

Template attacks on ECC trace back to an attack on ECDSA demonstrated by Medwed and Oswald [44]. However, this attack requires an offline DPA on the ECSM during profiling, in order to select the points of interest. Moreover, since the attack exploits data-dependent leakage it requires profiling with multiple templates (i.e., 33) while for our attacks two templates are enough. Furthermore, the attack only needs to recover a few bits of the multiple ephemeral scalars and can then employ ECDSA-specific lattice techniques to recover the long-term secret key [10]. This is not possible in the context of our work, since we do not target ECDSA: an attacker has only a single trace to recover sufficiently many bits of the randomized scalar using SCA to be able to compute the remaining bits.

Another template attack on ECC is presented in [30]. This attack follows a similar approach to our attack, but instead of exploiting address-dependent leakage, it exploits register location based leakage using a high-resolution inductive EM probe. As a result the attack is considerably expensive to execute. A template attack on a wNAF ECC algorithm is presented in [61]. However, this attack is applied to an implementation that is not protected with either, scalar randomization or base-point randomization. Another approach to attack ECC are the so called online template attacks [5, 22]. These attacks work if SR is enabled, but not when point randomization is enabled.

The template attack from [16] targets load instructions. However, multiple traces are required in the attack phase. Therefore, this attack does not work against implementations protected by SR. The template attack from [28] aims to extract a random multiplicative mask (base-blinding) out of a single measurement exploiting data leakage; then it is possible to unmask all intermediate values and run DPA.

Horizontal attacks on RSA [6, 8, 9, 15, 17, 18, 29, 54, 55, 57] and ECC [7, 27] are emerging forms of side-channel attacks on exponentiation-based or scalar-multiplication-based algorithms. Their methodology allows recovering the exponent bits through the analysis of individual traces. Therefore, these attacks are efficient against SR even when combined with point and group randomization. The attacks employ different common distinguishers: SPA, horizontal correlation analysis [18], Euclidean distance [57], horizontal collision-correlation [6,7,8, 17], horizontal cross-correlation [27], or clustering [29, 55].

An interesting horizontal address-based DPA attack on Montgomery multiplications is presented in [15]. The approach is similar to ours, but this attack exploits Hamming weight leakage of addresses. Furthermore, the analysis in [15] lacks the results for a full modular exponentiation (only a few iterations are attacked) and success rates.

The main issue of horizontal attacks is that extracting leakage from a single unlabeled trace is usually heavily limited by noise. Therefore, we have decided to attack our state-of-the art implementations, that contains scalar and point randomizations, using a more powerful attack paradigm, from the point of view of the attacker setting, namely, template attacks.

Contributions. The main contributions of this paper are threefold:

1.
First, by the example of a protected version of $\mu $NaCl, we show that the single-trace leakage of conditional moves within the Montgomery ladder can be exploited to recover the scalar.
2.
Second, we show that a similar attack applies to loads and stores from/to secret-dependent addresses. In doing so, we show that even implementations on embedded devices without cache cannot tolerate secret-dependent memory accesses.
3.
Finally, we generalize the method from [26] to tolerate a certain number of incorrectly recovered scalar bits without relying on normal or side-channel-enhanced exhaustive search. Furthermore, we present experimental results for our algorithm.

Organization of the paper. The remainder of this paper is structured as follows: in Sect. 2, we review the use of conditional moves in scalar multiplication algorithms, together with possible countermeasures against side-channel analysis. Then, in Sect. 3, we describe the measurement setup and target implementation used for our attacks presented subsequently: while Sect. 4 deals with template attacks on the (arithmetic) conditional swap within the Montgomery ladder, Sect. 5 applies similar methods to recover the scalar by exploiting the leakage of secret load addresses. Section 6 discusses how to tolerate a certain number of incorrectly recovered scalar bits more efficiently than by simple exhaustive search. Finally, we conclude in Sect. 7 with directions for future work, in particular regarding countermeasures.

2 Scalar Multiplication and Conditional Moves

The most basic scalar-multiplication algorithm is the double-and-add algorithm, which scans through the bits of the scalar and performs a double operation for each zero bit and a double-and-add operation for each one bit. This algorithm is well known to be vulnerable to all kind of side-channel attacks, including power analysis and timing attacks.

The first step to side-channel protection is to always perform the same sequence of finite-field operations, independent of the scalar. The most common approaches to achieve such a structure are either to use (fixed-window) double-and-add-always scalar multiplication or ladder-based approaches (typically the Montgomery ladder [45] or, for general Weierstrass curves, the Brier-Joye ladder [12]). Another layer of side-channel protection then adds randomization of the scalar (through one of various blinding methods), and the internal representation of points (for example through projective randomization, field isomorphisms, or curve isomorphisms). By re-randomizing before or after each ECSM loop iteration, most horizontal collision or cross-correlation attacks are thwarted.

Interestingly, even with all those countermeasures in place, scalar-multiplication algorithms contain operations that choose one out of two (or more) curve points depending on bit(s) of the scalar. An attacker who learns all of these choices from side-channel information from just one trace, learns all of the scalar bits used in this scalar multiplication and thus obtains the secret key. On microcontrollers with restricted register space, there are essentially two different ways to implement this conditional move (cmov): either by loading from (or storing to) addresses that depend on the secret scalar, or by using arithmetic operations to perform a conditional register-to-register move. The latter approach is very common on large processors with cache, where the former approach leaks through cache-timing information. Essentially, the idea is to replace a computation of the form $R \leftarrow P[s]$, where s is a secret scalar bit, by a computation of the form $R \leftarrow sP[1] + (1-s)P[0]$. Note that this approach does not require actual multiplications; it is much easier to expand s to a bit mask of all ones or all zeros and use bit-logical instructions.

Most implementations of ECSM contain considerably more than just one secretly-indexed load, store, or conditional move. Sometimes this is a choice made by the implementors to improve performance (by avoiding otherwise unnecessary loads and stores); sometimes it is an inherent property of the ECSM algorithm. For example, the Montgomery ladder needs a conditional swap (cswap) of two points instead of a conditional move, which requires significantly more operations that involve the secret scalar bit than a simple cmov (for details, see Sect. 4).

The side-channel attacks described in the remainder of this paper attack both implementations that make use of secretly indexed memory accesses (in Sect. 5) and implementations that use the arithmetic cmov operation (or more specifically, the cswap operation) in Sect. 4. The idea of attacking loads from secret positions through side-channel information is not new: it is not only used in various cache-timing attacks (that do not apply to simple architectures such as the AVR), but it is also the underlying principle of address-bit-DPA [34]. What is novel is the fact that we need only a single trace. This renders countermeasures such as scalar blinding and address randomization [35, 36] ineffective.

3 Attack Setup

In this section, we describe the targeted implementations, the utilized microcontroller, our measurement setup. The trace pre-processing, frequency filtering and alignement, are described in the full paper [48].

3.1 Target Implementations

We target two protected ECSM implementations based on [49]. Both employ the Montgomery ladder, with the pseudocode given in Algorithm 1. The main difference between the two variants is the realization of the cmov (i.e., the function cswap_coords): The first implementation, described in more detail in Sect. 4.1, consists of applying an arithmetic conditional swap of the respective coordinates values of the working points $P_1 = (X_1: Z_1)$ and $P_2 = (X_2: Z_2)$. The second, described in Sect. 5.1, replaces the arithmetic conditional swap by a conditional swap of pointers to the coordinate values. Both implementations utilize projective-coordinate re-randomization as the main side-channel countermeasure. A randomly generated $\lambda \in \mathbb {F}_p$ is multiplied with the coordinates of $P_1 = (X_1: Z_1)$ and $P_2 = (X_2: Z_2)$ at the beginning of every ECSM iteration. We make publicly available the source code for both implementations [47].

3.2 Target Device and Measurement Setup

We carried out our experiments with an ATmega328P 8-bit microcontroller placed on the target board of the ChipWhisperer [51] side-channel evaluation platform. While the ChipWhisperer also provides the possibility to capture analog signals (e.g., power consumption or electro-magnetic emanation), we used a separate oscilloscope (Picoscope 5203) due to the limited bandwidth, memory, and sample rate of the ChipWhisperer.

The targeted ATmega328P has a 32 KB of Flash, 2 KB of SRAM, and 1 KB of EEPROM. The register file contains 32 registers (R0–R31), among which 6 serve as pointers for indirect 16-bit addressing and have the following aliases: X (R27:R26), Y (R29:R28) and Z (R31:R30). Arithmetic instructions take 1 cycle, with the exception of multiplication instructions, which take 2 cycles. Loads and stores from/to SRAM take 2 cycles. Loads from Flash take 3 cycles. More technical details about the target device are given in the full paper [48].

4 Attacking Arithmetic Cswaps

In this section, we describe a template attack on conditional swaps (cswaps) in the Montgomery ladder step. In our case, the cswap is implemented using Boolean and arithmetic operations in constant time.

4.1 Target Implementation

In the Montgomery ladder (Algorithm 1), the function cswap_coords implements the cswap (based on input bit s) by first creating a mask m, which is either 0x00 or 0xFF for $s = 0$ and $s = 1$, respectively, by setting $m = -s$ (assuming m, s are 8-bit values). Then, a (conditional) XOR swap is executed as follows:

In other words, if $m =$ 0x00 ($s = 0$), $tt = 0$ and the XORs $\mathrm {xx = xx}\oplus \mathrm {tt}$ and $\mathrm {yy = yy}\oplus \mathrm {tt}$ leave the values unchanged. Otherwise, if $m =$ 0xFF ($s = 1$), we have a standard XOR swap, i.e., $\mathrm {xx = xx}\oplus \mathrm {xx} \oplus \mathrm {yy = yy}$ (equivalent for $\mathrm {yy}$).

4.2 Template Generation and Matching

We generated templates for the and instruction (line 5 of Listing 1.1), grouping the traces in the profiling set into two sets $V_0$ and $V_1$. Traces in $V_0$ represent those where $m =$ 0 (i.e., an AND with 0x00), while $V_1$ are traces where $m =$ 0xFF. Note that the traces were cut to only contain the clock cycle for the targeted and instruction, i.e., each trace is $64\cdot 67 = 4288$ samples long (cf. Appendix 2 of the full paper [48]). For $V_i$, $i = 0,1$, we subsequently computed templates consisting of the pointwise mean vector $\varvec{\mu }^{(i)}$ and the covariance matrix ${\varvec{\varSigma }}^{(i)}$ [14]. Note that the two possible leakages 0x00 (all bits zero) and 0xFF (all bits one) can be expected to be maximally (or at least to a large degree) different, which should facilitate template attacks in this particular case.

We matched the templates to the traces in the test set with the standard approach, i.e., computing the respective probabilities using the multivariate normal distribution pdf and identifying the template with the highest probability to recover the respective bit of the scalar. The respective success rates wrt the size of the profiling set are given in Sect. 4.3.

Classification. For each template we computed the Euclidean distance between the sample vector and the template mean vector. The template ($T_0$ or $T_1$) that results in the smallest distance is considered the best match for the sample vector. In this attack, the index of the closest template (0 or 1) corresponds to the swap bit.

Confidence score and confidence level. For the first classification method we derived a simple confidence score on the recovered bit value based on the distances ($d_0$ and $d_1$) to each template. It varies linearly for a particular $d_0 + d_1$ value, ranging from 0 (no confidence) and 1 (full confidence):

$$\begin{aligned} \mathrm {conf\_score} = 2 \cdot \left| {0.5 - \frac{min(d_0, d_1)}{d_0 + d_1}}\right| \end{aligned}$$

(1)

We furthermore define the confidence level of a given trace (in the test set) as follows: Let us call a recovered bit suspicious if its confidence level is less than the greatest confidence score of any falsely identified bit (whereas this threshold is determined experimentally in the profiling phase). Then, the confidence level is the percentage of bits that are not suspicious, i.e., that can be unambiguously recovered. Note that the average confidence level (over all number of traces in the test set) is always less than or equal to the average success rate, since an incorrectly recovered bit is always suspicious.

4.3 Attack Results

Figure 1 shows the average and best case success rates (computed over all 255 scalar bits), together with the respective confidence levels over the number of traces used for template generation and matching. Note that each full trace comprises 255 ECSM iterations, which were all used for generating the templates – in other words, each full trace contributes 255 “effective” traces to the profiling set.

The traces used for template generation and matching were taken from different trace sets (coming from different capture sessions). The same number of traces was used for profiling and testing, i.e., a given value on the horizontal axis of Fig. 1 is the same for profiling and testing.

As evident in Fig. 1, already for 10 full traces (i.e., about 2,550 effective traces), the average success rate reaches 96.71%, i.e., we can recover most of the bits of the scalar. Furthermore, the best success rate reaches 99.6% with the confidence level 76.1%. By increasing the number of traces, both success rate and confidence level change only minimally; due to the strong leakage of the targeted device, most information can be already extracted with a low trace count.

5 Attacking Secret-Dependent Memory Accesses

In general, ECC (and in particular NaCl-derived) implementations avoid loads from secret-dependent addresses altogether due to the possibility of cache-timing attacks. However, for embedded implementations without caches, secret load addresses are sometimes deemed acceptable. In this section, we show that template attacks can be employed to exploit this leakage.

5.1 Target Implementation

The targeted implementation replaces the cswap of the $(X_1:Z_1)$ and $(X_2:Z_2)$ coordinates values used in the targeted implementation in Algorithm 1 by working with pointers to those coordinates, and conditionally swapping these pointers. Besides being slightly faster, this implementation also potentially exhibits less leakage, because it uses the secret-dependent mask m in an AND operation only twice for each pointer cswap^{Footnote 2}, rather than 32 times as in the ECSM implementation based on arithmetic cswap (cf. Sect. 4.1).

However, in implementations of finite-field operations both input and output operands are pointers. The values of these pointers are addresses to the memory holding the actual field element value, and those addresses directly depend on whether the swap occurred or not, which in turn depends on the value of the secret mask bit.

AVR memory access instructions internals. Memory access instructions (loads and stores) on an AVR take 2 clock cycles to execute. According to the ATmega328 datasheet [3], the effective address for such instructions is computed in the first cycle, while during the second cycle, the data word is read (load) or written (store) if the effective address is valid. Our proposed attack focuses on the address leakage of memory access instructions, and thus any data-dependency may negatively impact the attack success rate if not detected and mitigated. Therefore, we take advantage of this architectural feature by using only the samples from the first clock period of such instructions.

Targeted loads and stores. During each iteration of the Montgomery ladder, the actual field arithmetic occurs in the so-called ladderstep function (cf. Algorithm 1). We target the loads and stores addresses in the first three field operations in ladderstep, i.e., addition, subtraction, and addition. Each of these operations has two $\mathbb {F}_p$ inputs (a and b) and one output r.

Finite-field addition and subtraction are implemented with reduction modulo $2^{256}-38$. The reduction step also execute loads and stores, of which the samples are also used for template creation and matching. Listing 1.2 shows a small segment of the execution trace containing the loads of the first operands bytes and the store of the first byte of the result (before reduction):

Our oscilloscope’s memory is divided into 255 segments, each of which is 65 kSample in length. A memory segment holds the samples captured from a single ECSM iteration. Due to the 65 kSample limit for each ECSM iteration, we were able to capture the samples from all the loads and stores from the first field addition and the first field subtraction, but only half of the loads and stores from the arithmetic part of the second field addition. Note that the memory limitation is due to the relatively low-cost oscilloscope we used—high-end equipment would further facilitate the presented attack.

Table 2 shows the number of executed instructions of each type that are used in the attack. We used a total of 372 instructions, which are concatenated into a single sample vector. After trace preprocessing, 67 power samples are available per clock cycle, and as only the first clock period of a memory access instruction is used, the sample vector per ECSM iteration has $n_v = 24,924$ samples.

Table 2. Number of executed instructions of each type that are used in the attack.

Full size table

5.2 Template Generation

Each load or store instruction accesses at most two possible addresses. If it always accesses the same address, then it does not provide useful leakage relevant for the attack. Considering only those loads and stores that may access two addresses, during any execution of the ladderstep, only two distinct sequences of addresses can be accessed: $A_{\mathrm {noswap}}$, containing the addresses accessed before the first pointers swap has taken place^{Footnote 3}, i.e., an even state (noswap state); and $A_{\mathrm {swap}}$ containing the addresses accessed in an odd state (swap state).

First, we grouped the sample vectors into two sets. The first set, $V_0$, consists of the load/store sample vectors for addresses in the set $A_{\mathrm {noswap}}$, while the second set, $V_1$, contains those originating from addresses in set $A_{\mathrm {swap}}$. Then, we computed various statistics for each sample index of $V_i$, $i = 0,1$: mean $\varvec{\mu }^{(i)}$, standard deviation $\varvec{\sigma }^{(i)}$, median $\varvec{md}^{(i)}$, as well as lower $\varvec{l}^{(i)}$ and upper $\varvec{u}^{(i)}$ percentiles (the actual percentiles used are discussed in Sect. 5.3). The collection of these statistics for $V_0$ and $V_1$, called $T_0$ and $T_1$, are the two possible templates.

5.3 Point-of-Interest Selection

The POI selection consists of using the lower and upper percentile vectors $\varvec{l}^{(i)}$ and $\varvec{u}^{(i)}$ (i=0,1) to compute the intersection of the pair of intervals $[ l_j^{(0)}, u_j^{(0)} ]$ and $[ l_j^{(1)}, u_j^{(1)} ]$ for each sample index $j=1,\cdots ,n_v$. The sample indices where the intersection is empty are the considered POIs.

Intuitively, the sample indices with an empty intersection are those that are good distinguishers for the two templates, because in these points the samples tend to be clustered around the median (and also typically around the mean) of one template, rather than being scattered.

Different values for the lower and upper percentiles may give a different number of POIs, and that directly affects the success rate and confidence level of the attack. Thus, we tested the attack for different pairs of values for these parameters, ranging from wider and more selective percentiles (12.5, 87.5)^{Footnote 4} to narrow, less selective (40, 60). We emphasize that the POI selection is completely based on the samples of the traces used for the generation—it does not depend on the samples of the trace being attacked (i.e., the sample vector to classify). In fact, the POIs are represented as a Boolean vector used during template matching to select the samples from the target trace vector to be classified.

POI selection refinements. To improve the confidence level of the attack, we tested two POI selection refinements, as explained above. First, we noticed that when using more selective percentile parameters, the current selection method returned sample indices that were clustered in a few instructions, while most of the remaining instructions were not covered by any sample, although they should in theory contribute some leakage. To make the POIs more evenly distributed and exploit leakage from all useful instructions, we forced a minimum of one sample index per instruction to be included in the POI vector. If there was no sample index for a given instruction in the current POI vector, one was randomly selected. Second, also due to the clustering of the POIs in a few instructions, we limit the number of samples per instruction to one. In the case that sample indices had to be removed, we selected those randomly as well.

5.4 Template Matching

At first, without using any POI selection, we tried to use the standard multivariate Gaussian model, taking advantage of both the mean vector and covariance matrix computed from $V_0$ and $V_1$ (also known as complete templates) similar to the approach of Sect. 4. However, in contrast to Sect. 4, the sample vectors to be classified and the mean template vectors are relatively long (24, 924 samples) and relatively similar to each other (i.e., their Euclidean distance is very small), numerical instability issues due to almost singular matrices arose during the computation of the probability density function. For those reasons, we decided to use reduced templates instead, which uses only the mean vectors.

After applying POI selection, the matched sample vectors are much smaller, and thus full templates could then in principle be applied, as the covariance matrices would not lead to numerical instability. However, due to the high success rates achieved using the reduced templates, we decided to not use full templates to avoid increasing storage and computational requirements.

We also evaluated the effect on the attack success rate and confidence level of compressing the sample vector using normal and absolute sum for different window lengths. In addition, we applied a straightforward outlier detection to remove samples that have likely been subject to larger distortions: In the matching phase, we discarded all samples that have a distance of more than a multiple of standard deviations to the mean trace at the respective point in time. Using reduced templates, template matching boils down to computing the (squared) Euclidean distance between the sample vector to match and the template mean vectors. The lower that distance is, the stronger is the match. In this case, other distinguishers can be used in a straightforward way, and thus we also tested the attack using the Pearson correlation coefficient.

Classification methods and confidence score. As a first classification method to test, we selected the template closer to the sample vector (cf. Sect. 4.2). We also tested majority voting classification, where each sample is individually classified, also based on its distance to the corresponding element in the templates mean vectors, and the majority vote wins. In both cases, as each template directly corresponds to a scalar bit value, the classification output is the recovered bit value. The confidence score was computed in the same way as in Sect. 4.2.

5.5 Attack Results

Figure 2 depicts average and best case success rates for the template attack on secret-dependent memory accesses for the best and average cases. Again, as in Sect. 4.3, the trace sets used for template generation and matching were recorded in different capture sessions, and the same number of traces was used for each set. Again, only a limited number of profiling traces was sufficient to reach success rates exceeding 90%; the best success rate reaches 95.3% (there are only 12 errors) with the confidence level 78.8% (the 12 errors are included in the 54 suspicious bits). To investigate the effect of various pre-processing steps and attack parameters, using 10 traces we investigated the average success rate and confidence level depending on various attack parameters. In particular, we investigated various signal frequency filtering options, POI selection methods, classification and compression methods, outlier filtering, and distinguishers; the result of the investigation are described in the full paper [48]. The best parameters that we discovered, were used to perform the main attack described in this section.

6 Error Detection and Correction

Due to noise, data leakage (note that we are aiming at exploiting the address leakage only), and other aspects that interfere with the side-channel analysis (misalignment, clock jitter, etc.), the derivation of the final scalar for a single trace likely contains errors. If the amount of wrong bits is sufficiently small, then a brute-force attack may still be feasible. However, first the attacker needs a metric to indicate the location of the possible wrong bits in the recovered scalar. The notion of suspicious bits (cf. Sect. 4.2) can be used as a reference for the scalar bits selection with respect to a brute-force attack.

Let us consider the trace with smallest amount of suspicious bits from the experiment from Sect. 5; for this trace there are $54$ suspicious bits that comprise all falsely identified bits. Unfortunately, to recover a full randomized scalar, even in this case, the attacker needs $2^{54}$ operations, which is generally impractical. Note, that we consider only the worst-case complexity and not the average case.

To improve upon the brute-force search complexity, there are two options. The first approach is to try to exploit the distribution of suspicious bits for incorrectly (red) and correctly (blue) recovered bits (Fig. 3). While there is a clear trend for incorrect bits to have lower confidence score, the intersection between correct and incorrect bits is large. Still, it may possible to exploit the trend with an informed brute force attack [40], prioritizing bits with the lowest confidence score. Unfortunately this attack works well if the bits containing errors are adjacent to each other and that is not the case in our setting.

Alternatively (or combined with the informed brute-force search), we apply the second algorithm from [26], which is originally designed for square-and-multiply chains, to the Montgomery ladder. We describe how the algorithm works using the aforementioned example trace, which contains $s=54$ suspicious bits, as an example. Let us represent the indices of these bits as a list sorted in descending order: $i_s, \dots i_1$, where each $i_j \in \{0, \dots 254\}$ and $s \ge j \ge 1$; note that there are 255 bits in total. Let x denote the bit index (namely, $i_{28}$ for the example trace). Let a be the number represented by the bit string corresponding to the left part of the scalar from x (including $i_x$) and let b be the number corresponding to the bit string of the (least significant) right part. Furthermore, we know that $R = [k] P$, where R is the resulting point, k the scalar to be recovered, and P the input point. Then, clearly $R = [k] P = [a \cdot 2^{i_x} + b] P = [a] ([2^{i_x}] P) + [b] P$. If we denote $[2^{i_x}] P$ by H, then the above equation reduces to

$$\begin{aligned} R - [b] P = [a] H \end{aligned}$$

(2)

We can use Eq. 2 to check correctness of our guess. Now, following [26], we use a time-memory trade-off technique to speed up an exhaustive search: Consider all different possible guesses for a. For each guess, we compute [a]H and store all pairs (a, [a]H). We then sort all pairs based on the value of [a]H and store them in an ordered table.

Next, we make a guess for b and compute $z = R - [b] P$. If our guess for b is correct, then z is present in the second column of some row in the table we built—the first column is the corresponding a. Finding such a pair can be done using binary search, as the table is sorted as per the second column. If z is present, we are done since we have determined the scalar. Otherwise, we make a new, different guess for b and continue. Since there are approximately $2^{\frac{s}{2}}$ guesses for a and b, the time complexity is $O(2^{\frac{s}{2}})$ operations. As there are $2^{\frac{s}{2}}$ guesses for a, the table has that many entries and the space complexity is $O(2^{\frac{s}{2}})$ points. This way, we limit the time complexity to $O(2^{\frac{s}{2}})$ (cf. [26] for a detailed complexity analysis), which is $2^{27}$ for the example trace.

We do not know which trace contains the smallest number of suspicious bits since we do not know the maximum confidence score of a falsely identified bit. However, to use the above algorithm we assume that we know the number of suspicious bits to be bruteforced to recover the correct scalar. This can be determined by using templates to attack some traces, for which we know the randomized key. Furthermore, note that if the attack fails, we can extend the execution to the second most likely suspicious bit and reuse the previously obtained data. Based on our experiments, we determined that the number $54$ of suspicious bits should cover all falsely identified bits for at least one trace. Our complete attack works as follows: we run the above algorithm sequentially for each of the n traces. We stop the attack as soon as the time-memory trade-off technique succeeds for one trace.

Since we are running the attack n times, the complexity of the complete attack is multiplied by n. It totals to $O(n \cdot 2^{\frac{s}{2}})$ operations and $O(n \cdot 2^{\frac{s}{2}})$ points in memory. For the attack from the previous section, this corresponds to $100\cdot 2^{27} = 2^{32}$ operations. Therefore, we conclude that the scalar can be recovered successfully and efficiently even in the presence of multiple errors and uncertain bits (for experimental results see Sect. 6.1). Furthermore, we believe that the above technique may be of independent interest since it can be applied to a commonly used ECSM algorithm, i.e., Montgomery ladder, even if errors are randomly spread across the scalar recovered by the SCA attack.

6.1 Algorithm Implementation and Experimental Results

The first challenge we faced is how to compute the point subtraction in Eq. 2. Curve25519 is a curve in the Montgomery form, and as such, there is an efficient formula for differential point addition using XZ coordinates, but no efficient formula to compute a standard point addition, as far as we know. For that reason, we decided to do the point addition in affine coordinates, which costs a field inversion and a few multiplications. However, to use them we need to know the y-coordinates y(R) and y([b]P). The attack assumes that x(R) (the ECSM output) is known, but y(R) is not, and thus has to be computed. To do so, we use the curve formula directly to compute the two possible values for y(R), at the cost of a field square root, an expensive operation, but it has to be done only once for each value of R. In the case of y([b]P), an efficient algorithm by Okeya and Sakurai [52] costs one field inversion.

To generate the table of precomputed points $A = [a]H$ and to compute $B = [b]P$ in Eq. (2), the naive approach is to compute a full ECSM for each value of a and b. A more efficient method is to apply Gray coding to the suspicious bits in scalars a and b. One property of such a code is that consecutive code words differ in just a single bit, which means that, in our context, we can generate $[k']P$ from [k]P using a single point addition (if the bit changed from 0 to 1) or point subtraction (if the change is from 1 to 0), where k and $k'$ are scalars whose unknown bits are represented as Gray code words, and the code word in $k'$ is the successor of the respective code word in k. To compute the sequence of points $[k_i]P$ ($i=0,1\ldots $), we first construct the scalar $k_0$, by setting the unknown bits to zero and the (assumed correct) recovered bits from the output of the SCA attack to their respective values. Then, we apply the full ECSM algorithm to compute $[k_0]P$, and from there we use the aforementioned method to generate the sequence of points $[k_1]P, [k_2]P \dots $, which costs essentially a point addition per each computed point.

We implemented the key recovery algorithm with the aforementioned arithmetic-level optimizations as a single-threaded program. We tested our implementation in a smaller scale, to recover 40 suspicious bits of a scalar on a PC with 8 GB of RAM total, but only 5 GB available for the program, a i7-3740QM CPU, running at 2.7 GHz. It took 1h23 to recover the correct scalar, where about 1.5 ms is spent to add a single entry to the table and about 3 ms to test a possible value of b. By using these time values as a reference, we estimate that the time for the recovery of a scalar with 60 suspicious bits using the current implementation is around 18 days. The source code of the key recovery implementation is publicly available [46].

7 Conclusions and Possible Countermeasures

In this paper we show that the single-trace data leakage of conditional moves can be exploited to recover the scalar using a template attack. We also show that a similar attack applies to address leakage due to loads and stores from/to secret-dependent addresses. Furthermore, we generalize the method from [26] to tolerate a certain number of incorrectly recovered scalar bits without relying on normal exhaustive search.

Now we discuss possible countermeasures against our attack. We consider evaluating or improving our attack to work against these countermeasures as future work. First of all, note that any countermeasure based on modifying the base point before or during the scalar multiplication does not protect against our attacks, since they aim at exploiting address-dependent and the cswap leakage. Similarly, scalar blinding or splitting does not affect the attack, since we require only one trace and could hence recover the blinded or split scalar. The knowledge of the randomized scalar (or the split scalars) is sufficient to either recover the original scalar or to compute the correct scalar multiplication result. A potential countermeasure against our attack is presented in [50], performing online data randomization during the exponentiation to prevent horizontal collision-correlation attacks. The main idea is to the split scalar to two parts and to randomly interleave two scalar multiplications. However, we believe that our attack might still be mounted if four templates are used to recognize which bit is processed and during which ECSM.

The idea behind Itoh et al. [34] memory-address countermeasure is to store sensitive variables at different memory addresses, but with the same Hamming weight. We believe that although this would cause our attack to be less effective, the addresses leakage may still be identified by template matching. Randomization of memory addresses of the coordinates used in the Montgomery ladder before the ECSM might lead to our attack being less effective, since the templates are prepared assuming fixed addresses. The above countermeasure can be improved by randomizing not only the addresses but also the memory accesses [35,36,37].

The countermeasure of [30] protects against localized EM template attacks on the ECC Montgomery ladder. The main idea is to randomly swap the ladder registers at the end of a ladder iteration; the addressing of the registers within the loop is inverted according to whether the registers have been swapped. The countermeasure is uniform in its operation sequence, and hence, our template attacks would be infeasible in principle. In addition, several randomization techniques protecting the Montgomery ladder are presented in [41]. Similarly to the countermeasure of [30], these techniques generate operation sequences independent from the scalar. Thus we assume that our attack would be less effective or ineffective against them. We therefore regard as future work evaluating and improving our attacks with respect to the three latter countermeasures.

Notes

1.
The implementations actually attacked apply only projective coordinates randomization, however, our attack also works on an implementation with SR enabled, because we do not make any assumption about the secret scalar, i.e., it may be different from one execution to another.
2.
For the AVR architecture, pointers are 16 bit wide and one AND with the secret-dependent bit is required to cswap a byte. Thus a pointer cswap requires two ANDs.
3.
These addresses are the same as those accessed after the ${2}^{\mathrm{nd}}$ but before the ${3}^{\mathrm{rd}}$ swap, or after the ${4}^{\mathrm{th}}$ but before the ${5}^{\mathrm{th}}$ swap, and so on.
4.
I.e., the lower is the 12.5-percentile and the upper is the 87.5-percentile.

References

Amaxilatis, D.: A generic algorithms library for heterogeneous, distributed, embedded systems. https://github.com/ibr-alg/wiselib
Aranha, D.F., Gouvêa, C.P.L.: RELIC is an Efficient LIbrary for Cryptography. https://github.com/relic-toolkit/relic
Atmel. Atmega328P datasheet (2016). http://www.atmel.com/devices/atmega328p.aspx
Bajard, J.-C., Imbert, L., Liardet, P.-Y., Teglia, Y.: Leak resistant arithmetic. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 62–75. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_5
Chapter Google Scholar
Batina, L., Chmielewski, Ł., Papachristodoulou, L., Schwabe, P., Tunstall, M.: Online template attacks. In: Meier, W., Mukhopadhyay, D. (eds.) INDOCRYPT 2014. LNCS, vol. 8885, pp. 21–36. Springer, Cham (2014). doi:10.1007/978-3-319-13039-2_2
Google Scholar
Bauer, A., Jaulmes, É.: Correlation analysis against protected SFM implementations of RSA. In: Paul, G., Vaudenay, S. (eds.) INDOCRYPT 2013. LNCS, vol. 8250, pp. 98–115. Springer, Cham (2013). doi:10.1007/978-3-319-03515-4_7
Chapter Google Scholar
Bauer, A., Jaulmes, É., Prouff, E., Reinhard, J., Wild, J.: Horizontal collision correlation attack on elliptic curves - extended version -. Cryptogr. Commun. 7, 91–119 (2015)
Article MATH MathSciNet Google Scholar
Bauer, A., Jaulmes, E., Prouff, E., Wild, J.: Horizontal and vertical side-channel attacks against secure RSA implementations. In: Dawson, E. (ed.) CT-RSA 2013. LNCS, vol. 7779, pp. 1–17. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36095-4_1
Chapter Google Scholar
Bauer, S.: Attacking exponent blinding in RSA without CRT. In: Schindler, W., Huss, S.A. (eds.) COSADE 2012. LNCS, vol. 7275, pp. 82–88. Springer, Heidelberg (2012). doi:10.1007/978-3-642-29912-4_7
Chapter Google Scholar
Benger, N., van de Pol, J., Smart, N.P., Yarom, Y.: “Ooh aah... just a little bit”: a small amount of side channel can go a long way. In: Batina, L., Robshaw, M. (eds.) CHES 2014. LNCS, vol. 8731, pp. 75–92. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44709-3_5
Google Scholar
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). doi:10.1007/978-3-540-28632-5_2
Chapter Google Scholar
Brier, É., Joye, M.: Weierstraß elliptic curves and side-channel attacks. In: Naccache, D., Paillier, P. (eds.) PKC 2002. LNCS, vol. 2274, pp. 335–345. Springer, Heidelberg (2002). doi:10.1007/3-540-45664-3_24
Chapter Google Scholar
CertiVox. MIRACL Cryptographic SDK. https://github.com/CertiVox/MIRACL
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5_3
Chapter Google Scholar
Chen, C.-N.: Memory address side-channel analysis on exponentiation. In: Lee, J., Kim, J. (eds.) ICISC 2014. LNCS, vol. 8949, pp. 421–432. Springer, Cham (2015). doi:10.1007/978-3-319-15943-0_25
Google Scholar
Choudary, O., Kuhn, M.G.: Efficient template attacks. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 253–270. Springer, Cham (2014). doi:10.1007/978-3-319-08302-5_17
Google Scholar
Clavier, C., Feix, B., Gagnerot, G., Giraud, C., Roussellet, M., Verneuil, V.: ROSETTA for single trace analysis. In: Galbraith, S., Nandi, M. (eds.) INDOCRYPT 2012. LNCS, vol. 7668, pp. 140–155. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34931-7_9
Chapter Google Scholar
Clavier, C., Feix, B., Gagnerot, G., Roussellet, M., Verneuil, V.: Horizontal correlation analysis on exponentiation. In: Soriano, M., Qing, S., López, J. (eds.) ICICS 2010. LNCS, vol. 6476, pp. 46–61. Springer, Heidelberg (2010). doi:10.1007/978-3-642-17650-0_5
Chapter Google Scholar
Coron, J.-S.: Resistance against differential power analysis for elliptic curve cryptosystems. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 292–302. Springer, Heidelberg (1999). doi:10.1007/3-540-48059-5_25
Chapter Google Scholar
Courrège, J.-C., Feix, B., Roussellet, M.: Simple power analysis on exponentiation revisited. In: Gollmann, D., Lanet, J.-L., Iguchi-Cartigny, J. (eds.) CARDIS 2010. LNCS, vol. 6035, pp. 65–79. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12510-2_6
Chapter Google Scholar
Danger, J.-L., Guilley, S., Hoogvorst, P., Murdica, C., Naccache, D.: A synthesis of side-channel attacks on elliptic curve cryptography in smart-cards. J. Cryptogr. Eng. 3(4), 1–25 (2013)
Article Google Scholar
Dugardin, M., Papachristodoulou, L., Najm, Z., Batina, L., Danger, J., Guilley, S., Courrège, J., Therond, C.: Dismantling real-world ECC with horizontal and vertical template attacks. Cryptology ePrint Archive, Report 2015/1001 (2015)
Google Scholar
Düll, M., Haase, B., Hinterwälder, G., Hutter, M., Paar, C., Sánchez, A.H., Schwabe, P.: High-speed curve25519 on 8-bit, 16-bit and 32-bit microcontrollers. Des. Codes Crypt. 77(2), 493–514 (2015)
Article MATH MathSciNet Google Scholar
Dupaquis, V., Venelli, A.: Redundant modular reduction algorithms. In: Prouff, E. (ed.) CARDIS 2011. LNCS, vol. 7079, pp. 102–114. Springer, Heidelberg (2011). doi:10.1007/978-3-642-27257-8_7
Chapter Google Scholar
Fouque, P.-A., Valette, F.: The doubling attack – why upwards is better than downwards. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 269–280. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45238-6_22
Chapter Google Scholar
Gopalakrishnan, K., Thériault, N., Yao, C.Z.: Solving discrete logarithms from partial knowledge of the key. In: Srinathan, K., Rangan, C.P., Yung, M. (eds.) INDOCRYPT 2007. LNCS, vol. 4859, pp. 224–237. Springer, Heidelberg (2007). doi:10.1007/978-3-540-77026-8_17
Chapter Google Scholar
Hanley, N., Kim, H.S., Tunstall, M.: Exploiting collisions in addition chain-based exponentiation algorithms using a single trace. In: Nyberg, K. (ed.) CT-RSA 2015. LNCS, vol. 9048, pp. 431–448. Springer, Cham (2015). doi:10.1007/978-3-319-16715-2_23
Google Scholar
Herbst, C., Medwed, M.: Using templates to attack masked montgomery ladder implementations of modular exponentiation. In: Chung, K.-I., Sohn, K., Yung, M. (eds.) WISA 2008. LNCS, vol. 5379, pp. 1–13. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00306-6_1
Chapter Google Scholar
Heyszl, J., Ibing, A., Mangard, S., Santis, F., Sigl, G.: Clustering algorithms for non-profiled single-execution attacks on exponentiations. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 79–93. Springer, Cham (2014). doi:10.1007/978-3-319-08302-5_6
Google Scholar
Heyszl, J., Mangard, S., Heinz, B., Stumpf, F., Sigl, G.: Localized electromagnetic analysis of cryptographic implementations. In: Dunkelman, O. (ed.) CT-RSA 2012. LNCS, vol. 7178, pp. 231–244. Springer, Heidelberg (2012). doi:10.1007/978-3-642-27954-6_15
Chapter Google Scholar
Homma, N., Miyamoto, A., Aoki, T., Satoh, A., Shamir, A.: Comparative power analysis of modular exponentiation algorithms. IEEE Trans. Comput. 59(6), 795–807 (2010)
Article MATH MathSciNet Google Scholar
Hutter, M., Schwabe, P.: NaCl on 8-bit AVR microcontrollers. In: Youssef, A., Nitaj, A., Hassanien, A.E. (eds.) AFRICACRYPT 2013. LNCS, vol. 7918, pp. 156–172. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38553-7_9
Chapter Google Scholar
iSec Partners. nano-ecc - a very small ECC implementation for 8-bit microcontrollers (2016). https://github.com/iSECPartners/nano-ecc
Itoh, K., Izu, T., Takenaka, M.: Address-bit differential power analysis of cryptographic schemes OK-ECDH and OK-ECDSA. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 129–143. Springer, Heidelberg (2003). doi:10.1007/3-540-36400-5_11
Chapter Google Scholar
Itoh, K., Izu, T., Takenaka, M.: A practical countermeasure against address-bit differential power analysis. In: Walter, C.D., Koç, Ç.K., Paar, C. (eds.) CHES 2003. LNCS, vol. 2779, pp. 382–396. Springer, Heidelberg (2003). doi:10.1007/978-3-540-45238-6_30
Chapter Google Scholar
Izumi, M., Ikegami, J., Sakiyama, K., Ohta, K.: Improved countermeasure against address-bit DPA for ECC scalar multiplication. In: 2010 Design, Automation & Test in Europe Conference and Exhibition (DATE 2010), pp. 981–984. IEEE (2010)
Google Scholar
Izumi, M., Sakiyama, K., Ohta, K.: A new approach for implementing the MPL method toward higher SPA resistance. In: International Conference on Availability, Reliability and Security, ARES 2009, pp. 181–186. IEEE (2009)
Google Scholar
Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). doi:10.1007/3-540-68697-5_9
Google Scholar
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). doi:10.1007/3-540-48405-1_25
Chapter Google Scholar
Lange, T., Vredendaal, C., Wakker, M.: Kangaroos in side-channel attacks. In: Joye, M., Moradi, A. (eds.) CARDIS 2014. LNCS, vol. 8968, pp. 104–121. Springer, Cham (2015). doi:10.1007/978-3-319-16763-3_7
Google Scholar
Le, D.-P., Tan, C.H., Tunstall, M.: Randomizing the montgomery powering ladder. In: Akram, R.N., Jajodia, S. (eds.) WISTP 2015. LNCS, vol. 9311, pp. 169–184. Springer, Cham (2015). doi:10.1007/978-3-319-24018-3_11
Chapter Google Scholar
Liu, A., Ning, P.: TinyECC: A Configurable Library for Elliptic Curve Cryptography in Wireless Sensor Networks (Version 1.0). http://discovery.csc.ncsu.edu/software/TinyECC/ver1.0/index.html
Mackay, K.: micro-ecc – ECDH and ECDSA for 8-bit, 32-bit, and 64-bit processors (2016). https://github.com/kmackay/micro-ecc
Medwed, M., Oswald, E.: Template attacks on ECDSA. In: Chung, K.-I., Sohn, K., Yung, M. (eds.) WISA 2008. LNCS, vol. 5379, pp. 14–27. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00306-6_2
Chapter Google Scholar
Montgomery, P.L.: Speeding the Pollard and elliptic curve methods of factorization. Math. Comput. 48(177), 243–264 (1987)
Article MATH MathSciNet Google Scholar
Nascimento, E.: SAC 2016 - Implementation of algorithm for ECDLP with errors based on a time-memory tradeoff (2016). https://github.com/enascimento/SCA-ECC-keyrecovery
Nascimento, E.: SAC 2016 - Targeted Curve25519 implementations for AVR (2016). https://github.com/enascimento/sac2016-avr-target-impls
Nascimento, E., Chmielewski, L., Oswald, D., Schwabe, P.: Attacking embedded ECC implementations through cmov side channels (2016). https://eprint.iacr.org/2016/923
Nascimento, E., López, J., Dahab, R.: Efficient and secure elliptic curve cryptography for 8-bit AVR microcontrollers. In: Chakraborty, R.S., Schwabe, P., Solworth, J. (eds.) SPACE 2015. LNCS, vol. 9354, pp. 289–309. Springer, Cham (2015). doi:10.1007/978-3-319-24126-5_17
Chapter Google Scholar
Negre, C., Perin, G.: Trade-off approaches for leak resistant modular arithmetic in RNS. In: Foo, E., Stebila, D. (eds.) ACISP 2015. LNCS, vol. 9144, pp. 107–124. Springer, Cham (2015). doi:10.1007/978-3-319-19962-7_7
Chapter Google Scholar
O’Flynn, C., Chen, Z.D.: ChipWhisperer: an open-source platform for hardware embedded security research. In: Prouff, E. (ed.) COSADE 2014. LNCS, vol. 8622, pp. 243–260. Springer, Cham (2014). doi:10.1007/978-3-319-10175-0_17
Google Scholar
Okeya, K., Sakurai, K.: Efficient elliptic curve cryptosystems from a scalar multiplication algorithm with recovery of the y-coordinate on a montgomery-form elliptic curve. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 126–141. Springer, Heidelberg (2001). doi:10.1007/3-540-44709-1_12
Chapter Google Scholar
Otte, D.: Avr-crypto-lib (2016). https://git.cryptolib.org/avr-crypto-lib.git
Perin, G., Chmielewski, Ł.: A semi-parametric approach for side-channel attacks on protected RSA implementations. In: Homma, N., Medwed, M. (eds.) CARDIS 2015. LNCS, vol. 9514, pp. 34–53. Springer, Cham (2016). doi:10.1007/978-3-319-31271-2_3
Chapter Google Scholar
Perin, G., Imbert, L., Torres, L., Maurine, P.: Attacking randomized exponentiations using unsupervised learning. In: Prouff, E. (ed.) COSADE 2014. LNCS, vol. 8622, pp. 144–160. Springer, Cham (2014). doi:10.1007/978-3-319-10175-0_11
Google Scholar
Sigma. ECDSA and ECDH cryptographic algorithms for 8-bit AVR microcontrollers. http://www.cmmsigma.eu/products/crypto/crs_avr010x.en.html
Walter, C.D.: Sliding windows succumbs to big mac attack. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 286–299. Springer, Heidelberg (2001). doi:10.1007/3-540-44709-1_24
Chapter Google Scholar
Wang, H.: WM-ECC is an Elliptic Curve Cryptography (ECC) primitive suite developed exclusively for wireless sensor motes. http://cis.csuohio.edu/~hwang/WMECC.html
Wenger, E., Unterluggauer, T., Werner, M.: 8/16/32 shades of elliptic curve cryptography on embedded processors. In: Paul, G., Vaudenay, S. (eds.) INDOCRYPT 2013. LNCS, vol. 8250, pp. 244–261. Springer, Cham (2013). doi:10.1007/978-3-319-03515-4_16
Chapter Google Scholar
wolfSSL. Embedded Web Server for AVR. https://www.wolfssl.com/wolfSSL/Blog/Entries/2010/11/16_Embedded_Web_Server_for_AVR.html
Zhang, Z., Wu, L., Mu, Z., Zhang, X.: A novel template attack on wNAF algorithm of ECC. In: 2014 Tenth International Conference on Computational Intelligence and Security (CIS), pp. 671–675. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing, University of Campinas, Campinas, Brazil
Erick Nascimento
Riscure BV, Delft, The Netherlands
Łukasz Chmielewski
School of Computer Science, University of Birmingham, Birmingham, UK
David Oswald
Digital Security Group, Radboud University, Nijmegen, The Netherlands
Peter Schwabe

Authors

Erick Nascimento
View author publications
You can also search for this author in PubMed Google Scholar
Łukasz Chmielewski
View author publications
You can also search for this author in PubMed Google Scholar
David Oswald
View author publications
You can also search for this author in PubMed Google Scholar
Peter Schwabe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Erick Nascimento .

Editor information

Editors and Affiliations

ARM, Systems Architecture Group, Grasbrunn, Germany
Roberto Avanzi
Memorial University of Newfoundland, St. John’s, Newfoundland and Labrador, Canada
Howard Heys

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nascimento, E., Chmielewski, Ł., Oswald, D., Schwabe, P. (2017). Attacking Embedded ECC Implementations Through cmov Side Channels. In: Avanzi, R., Heys, H. (eds) Selected Areas in Cryptography – SAC 2016. SAC 2016. Lecture Notes in Computer Science(), vol 10532. Springer, Cham. https://doi.org/10.1007/978-3-319-69453-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-69453-5_6
Published: 20 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69452-8
Online ISBN: 978-3-319-69453-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics