Keywords

1 Introduction

Problems such as factoring and solving discrete logarithms, believed to be classically intractable, underlie the security of most asymmetric cryptographic primitives in use today. After Shor found a quantum polynomial-time algorithm for both [44], the cryptographic community has been actively working on replacements, culminating with the ongoing NIST call for post-quantum primitives [37].

One of the families of problems studied concerns elliptic curve isogenies. In this setting, we consider a graph, whose vertices are elliptic curves, and whose edges are non constant morphisms (isogenies). The problem of finding a path between two given curves was first used in the design of the CGL hash functions [13] with supersingular isogeny graphs. Afterwards, a key-exchange based on ordinary curves (CRS) was proposed independently by Rostovtsev and Stolbunov [45] and Couveignes [18]. Later, a quantum algorithm was given in [16], that could find an isogeny between two such curves in subexponential time, a problem for which classical algorithms still require exponential time. Although it is not broken in quantum polynomial time, the scheme became considered as too inefficient with respect to its post-quantum security.

Meanwhile, a key-exchange based on supersingular elliptic curves isogenies was proposed [21], and the candidate SIKE was selected for the second round of the NIST standardization process. The quantum algorithm for finding ordinary isogenies cannot be applied for the supersingular graphs, and the best known quantum algorithm for breaking SIKE has an exponential time complexity.

CSIDH. CSIDH is a new primitive presented at ASIACRYPT 2018 [12]. Its name stands for “commutative supersingular isogeny Diffie-Hellman”, and its goal is to make isogeny-based key exchange efficient in the commutative case, analogous to a regular non-interactive Diffie-Hellman key exchange. CSIDH uses supersingular elliptic curves defined over \(\mathbb {F}_{p}\). In this case, the \(\mathbb {F}_{p}\)-isogeny graph has a structure analogous to the ordinary isogeny graph, and the subexponential quantum attack of [16] also applies. CSIDH aims at an improved balance between efficiency and security with respect to the original CRS scheme. However, it stands in a peculiar situation. To the best of our knowledge, it is the only post-quantum scheme actively studiedFootnote 1 against which a quantum adversary enjoys more than a polynomial speedup. Schemes based on lattices, codes, or SIKE, rely on problems with a quantum speedup quadratic at best.

In only two years, CSIDH has been the subject of many publications, showing a renewed interest for protocols based on commutative elliptic curve isogenies. It has been used in [20] to devise the signature scheme SeaSign. CSIDH and SeaSign were further studied and their efficiency was improved in [22, 26, 35, 36], the last two works published at PQCRYPTO 2019.

Meanwhile, there has been a few works dealing with the security of CSIDH. The asymptotic cost of attacking the scheme, with classical precomputations and a quantum polynomial-space algorithm, was studied in [7]. Asymptotically also, it was shown in [27] that CSIDH (and CRS) could be attacked in polynomial space. Next, a quantum-classical trade-off using Regev’s variant [39] of Kuperberg’s sieve was proposed in [8]. Only two works studied the concrete parameters proposed in [12]: independently from us, Peikert [38] attacked CSIDH-512 using Kuperberg’s collimation sieve [32]. Contrary to us, he uses classical memory with quantum random access. Finally, the number of Toffoli gates required to implement a CSIDH-512 key-exchange in constant time has been studied in full detail in [4], published at EUROCRYPT 2019. However, the authors designed an irreversible classical circuit, and the memory usage of an immediate translation to a quantum circuit seems massive (see the appendix of [4]).

Contributions. In this paper, we make a decisive move towards understanding the quantum security of CSIDH. First, we revisit three quantum abelian hidden shift algorithms from the available literature, that can be used to recover the secret key in a CSIDH key-exchange, from the point of view of non-asymptotic cost. We give a wide range of trade-offs between their quantum and classical time and memory complexities. Second, we give quantum circuits for computing the isogenies in CSIDH. Building on [4], with the addition of quantum time-space tradeoffs for reversible computations and refined quantum search, we give a quantum procedure that computes the action of the class group in CSIDH-512 using \(2^{49.8}\) Toffoli gates and less than 40 000 qubits. Putting together our improved query complexities and this new quantum circuit, we are able to attack CSIDH-512, -1024 and -1792 in \(2^{10}\) to \(2^{48}\) less quantum time than expected, using only tens of thousands of logical qubits.

Paper Outline. Section 2 below presents the context of the CSIDH group action and outlines the attack. We next go into the details of the two building blocks: a quantum black-box hidden shift algorithm, and a quantum procedure to evaluate the class group action. In Sect. 3, we present the three main quantum algorithms for finding abelian hidden shifts. Our contribution here is to give non-asymptotic estimates of them, and to write a simple algorithm for cyclic hidden shift (Algorithm 2), which can be easily simulated. In Sect. 4, we show how to replace the class group action oracle by the CSIDH group action oracle using lattice reduction. We study the latter in Sect. 5. We summarize our complexity analysis in Sect. 6.

2 Preliminaries

In this section, we present the rationale of CSIDH and the main ideas of its quantum attack. Throughout this paper, we use extensively standard notions of quantum computing such as qubits, ancilla qubits, quantum gates, entanglement, uncomputing, quantum Fourier Transform (QFT), CNOT and Toffoli gates. We use the Dirac notation of quantum states \(|\rangle \). We analyze quantum algorithms in the quantum circuit model, where the number of qubits represents the quantum space used, including ancilla qubits which are restored to their initial state after the computation. Time is the number of quantum gates in the circuit (we do not consider the metric of circuit depth). We use the standard “Clifford+T” universal gate set for all our benchmarks [25] and focus notably on the T-gate count, as T-gates are usually considered an order of magnitude harder to realize than Clifford gates. It is possible to realize the Toffoli gate with 7 T-gates.

2.1 Context of CSIDH

Let \(p > 3\) be a prime number. In general, supersingular elliptic curves over \(\overline{\mathbb {F}_{p}}\) are defined over a quadratic extension \(\mathbb {F}_{p^2}\). However, the case of supersingular curves defined over \(\mathbb {F}_{p}\) is special. When \(\mathcal {O}\) is an order in an imaginary quadratic field, each supersingular elliptic curve defined over \(\mathbb {F}_{p}\) having \(\mathcal {O}\) as its \(\mathbb {F}_{p}\)-rational endomorphism ring corresponds to an element of \(\mathcal {C}\ell \mathcal {(O)}\), the ideal class group of \(\mathcal {O}\). Moreover, a rational \(\ell \)-isogeny from such a curve corresponds to an ideal of norm \(\ell \) in \(\mathcal {C}\ell \mathcal {(O)}\). The (commutative) class group \(\mathcal {C}\ell \mathcal {(O)}\) acts on the set of supersingular elliptic curves with \(\mathbb {F}_{p}\)-rational endomorphism ring \(\mathcal {O}\).

One-way Group Action. All use cases of the CSIDH scheme can be pinned down to the definition of a one-way group action (this is also the definition of a hard homogeneous space by Couveignes [18]). A group G acts on a set X. Operations in G, and the action \(g * x\) for \(g \in G, x\in X\), are easy to compute. Recovering g given x and \(x' = g * x\) is hard. In the case of CSIDH, X is a set of Montgomery curves of the form \(E_A : y^2 = x^3 + Ax^2 + x\) for \(A \in \mathbb {F}_{p}\), and the group G is \(\mathcal {C}\ell \mathcal {(O)}\) for \(\mathcal {O} = \mathbb {Z}[\sqrt{-p}]\). Taking \(g * x\) for an element in \(\mathcal {C}\ell \mathcal {(O)}\) (i.e. an isogeny) and a curve corresponds to computing the image curve of x by this isogeny.

CSIDH and CRS both benefit from this action of the class group, which also exists in the ordinary case. Quantum algorithms for recovering abelian hidden shifts solve exactly this problem of finding g when G is commutative. There exists a family of such algorithms, initiated by Kuperberg. The variant of [16] targets precisely the context of ordinary curves, and it can be applied to CSIDH.

Representation of \(\mathcal {C}\ell \mathcal {(O)}\). The designers choose a prime p of the form: \( p = 4\cdot \ell _1 \cdots \ell _{u} - 1\) where \(\ell _1, \dots , \ell _u\) are small primes. This enables to represent the elements of \(\mathcal {C}\ell \mathcal {(O)}\) (hence, the isogenies) in a way that is now specific to CSIDH, and the main reason of its efficiency. Indeed, since each of the \(\ell _i\) divides \(-p -1 = \pi ^2 - 1\), the ideal \(\ell _i \mathcal {O}\) splits and \(\mathfrak {l}_i = (\ell _i, \pi - 1)\) is an ideal in \(\mathcal {O}\). The image curves by these ideals can be computed efficiently [12, Section 8].

The designers consider the set \(\{ \prod _{i=1}^{u} [\mathfrak {l}_i]^{e_i}, -m \le e_i \le m \} \subseteq \mathcal {C}\ell \mathcal {(O)}\), where \([\mathfrak {l}_i]\) is the class of \(\mathfrak {l}_i\). If we suppose that these products fall randomly in \(\mathcal {C}\ell \mathcal {(O)}\), which has \(O(\sqrt{p})\) elements, it suffices to take \(2m + 1 \simeq p^{1/(2u)}\) in order to span the group \(\mathcal {C}\ell \mathcal {(O)}\) or almost all of it. Since a greater m yields more isogeny computations, u should be the greatest possible. With this constraint in mind, we estimate \(u = 132\) and \(u = 209\) for CSIDH-1024 and CSIDH-1792 respectively (for CSIDH-512, we know that \(u =74\) and the list of primes is given in [12]).

Given an element of \(\mathcal {C}\ell \mathcal {(O)}\) of the form \([\mathfrak {b}] = \prod _{i=1}^{u} [\mathfrak {l}_i]^{e_i}\), we compute \(E' = [\mathfrak {b}] \cdot E\) by applying a sequence of \(\sum _i e_i\) isogenies. The CSIDH public keys are curves. The secret keys are isogenies of this form.

CSIDH Original Security Analysis. The problem underlying the security of CSIDH is: given two Montgomery curves \(E_A\) and \(E_B\), recover the isogeny \([\mathfrak {b}] \in \mathcal {C}\ell \mathcal {(O)}\) such that \(E_B = [\mathfrak {b}] \cdot E_A\). Moreover, the ideal \(\mathfrak {b}\) that represents it should be sufficiently “small”, so that the action of \([\mathfrak {b}]\) on a curve can be evaluated. The authors study different ways of recovering \([\mathfrak {b}]\). The complexity of these methods depends on the size of the class group \(N = \#\mathcal {C}\ell \mathcal {(O)}= O(\sqrt{p})\). Classically, the best method seems the exhaustive key search of \([\mathfrak {b}]\) using a meet-in-the-middle approach: it costs \(O(p^{1/4})\). Quantumly, they use the cost given in [16] for ordinary curves: \(\exp \left( (\sqrt{2} + o(1))\sqrt{\log N \log \log N} \right) \).

Levels of Security. In [12], the CSIDH parameters 512, 1024 and 1792 bits are conjectured secure up to the respective levels 1, 3 and 5 of the NIST call [37]. These levels correspond respectively to a key-recovery on AES-128, on AES-192 and AES-256. A cryptographic scheme, instantiated with some parameter size, matches level 1 if there is no quantum key-recovery running faster than quantum exhaustive search of the key for AES-128, and classical key-recovery running faster than classical exhaustive search. The NIST call considered the quantum gate counts given in [25]. These were improved later in [33], and we choose to adopt these improvements in this paper. For example, AES-128 key-recovery can be done with Grover search using \(1.47 \cdot 2^{81}\) T-gates and 865 qubits. Hence any algorithm using less than \(1.47 \cdot 2^{81}\) T-gates and \(2^{128}\) classical computations breaks the NIST level 1 security, as it runs below the security level of AES-128.

2.2 Attack Outline

Algorithm 1 outlines a quantum key-recovery on CSIDH. Given \(E_A, E_B\), we find a vector \(\bar{e}\) such that \(E_B = \prod _i [\mathfrak {l}_i]^e_i \cdot E_A\). We will not retrieve the exact secret key which was selected at the beginning, but the output \(\bar{e}\) will have an \(L_1\) norm small enough that it can be used instead, and impersonate effectively the secret key.

figure a

In order to evaluate the cost of Algorithm 1, we need to study the quantum query complexity of the black-box hidden shift algorithm applied, but also its classical complexity, as it will often contain some quantum-classical trade-off. Afterwards, we need to analyze the quantum gate complexity of an oracle for the action of the ideal class group on Montgomery curves. There will also be classical precomputations.

In [16], in the context of ordinary curves, the authors show how to evaluate \([x] \cdot E\) for any ideal class [x] in superposition, in subexponential time. For CSIDH, in a non-asymptotic setting, it is best to use the structure provided by the scheme (contrary to [7]). We have supposed that the class group is spanned by products of the form \([\mathfrak {l}_1]^{e_1} \ldots [\mathfrak {l}_u]^{e_u}\) with small \(e_i\). If we are able to rewrite any [x] as such a product, then the evaluation of the class group action \([x] \cdot E\) costs no more than the evaluation of the CSIDH group action \(\prod _i [\mathfrak {l}_i]^{e_i} \cdot E\). Here, a technique based on lattice reduction intervenes, following [6, 7, 18].

In general, although the class group is spanned by the products used in the CSIDH key-exchange: \(\left\{ [\mathfrak {l}_1]^{e_1} \ldots [\mathfrak {l}_u]^{e_u}, -m \le e_i \le m \right\} \), we cannot retrieve the shortest representation of a given [x]. There is some approximation overhead, related to the quality of the lattice precomputations. In Sect. 4, we will show that this overhead is minor for the CSIDH original parameters.

3 Quantum Abelian Hidden Shift Algorithms

In this section, we present in detail three quantum algorithms for solving the hidden shift problem in commutative (abelian) groups. For each of them, we give tradeoff formulas and non-asymptotic estimates. The first one (Sect. 3.2) is a new variant of [31] for cyclic groups, whose behavior is easy to simulate. The second is by Regev [39] and Childs, Jao and Soukharev [16]. The third is Kuperberg’s second algorithm [32].

3.1 Context

The hidden shift problem is defined as follows:

Problem 1 (Hidden shift problem)

Let \((\mathbb {G},+)\) be a group, \(f,g : \mathbb {G} \rightarrow \mathbb {G}\) two permutations such that there exists \(s \in \mathbb {G}\) such that, for all x, \(f(x) = g(x+s)\). Find s.

Classically, this problem essentially reduces to a collision search, but in the case of commutative groups, there exists quantum subexponential algorithms. The first result on this topic was an algorithm with low query complexity, by Ettinger and Høyer [24], which needs \(O(\log (N))\) queries and O(N) classical computations to solve the hidden shift in \(\mathbb {Z}/{N}\mathbb {Z}\). The first time-efficient algorithms were proposed by Kuperberg in [31]. His Algorithm 3 is shown to have a complexity in quantum queries and memory of \(\widetilde{O} \left( 2^{\sqrt{2\log _2(3)\log _2(N)}} \right) \) for the group \(\mathbb {Z}/N\mathbb {Z}\) for smooth N, and his Algorithm 2 is in \(O\left( 2^{3\sqrt{\log _2(N)}}\right) \), for any N. This has been followed by a memory-efficient variant by Regev, with a query complexity in \(L_N(1/2,\sqrt{2})\) and a polynomial memory complexity, in [39], which has been generalized by Kuperberg in [32], with an algorithm in \(\widetilde{O}\left( 2^{\sqrt{2\log _2(N)}}\right) \) quantum queries and classical memory, and a polynomial quantum memory. Regev’s variant has been generalized to arbitrary commutative groups in the appendix of [16], with the same complexity. A complexity analysis of this algorithm with tighter exponents can be found in [9].

A broad presentation of subexponential-time quantum hidden shift algorithms can be found in [39]. Their common design is to start with a pool of labeled qubits, produced using quantum oracle queries for f and g. Each qubit contains information in the form of a phase shift between the states \(|0\rangle \) and \(|1\rangle \). This phase shift depends on the (known) label \(\ell \) and on the (unknown) hidden shift s. Then, they use a combination procedure that consumes labeled qubits and creates new ones. The goal is to make the label \(\ell \) reach some wanted value (e.g. \(2^{n-1}\)), at which point meaningful information on s (e.g. one bit) can be extracted.

Cyclic Groups and Concrete Estimates. In [10], the authors showed that the polynomial factor in the \(\widetilde{O}\), for a variant of Kuperberg’s original algorithm, is a constant around 1 if N is a power of 2. In the context of CSIDH, the cardinality of the class group \(\mathcal {C}\ell \mathcal {(O)}\) is not a power of 2, but in most cases, its odd part is cyclic, as shown by the Cohen–Lenstra heuristics [17]. So we choose to approximate the class group as a cyclic group. This is why we propose in what follows a generalization of [10, Algorithm 2] that works for any N, at essentially the same cost. We suppose that an arbitrary representation of the class group is available; one could be obtained with the quantum polynomial-time algorithm of [14], as done in [16].

3.2 A First Hidden Shift Algorithm

In this section, we present a generic hidden shift algorithm for \(\mathbb {Z}/N\mathbb {Z}\), which allows us to have the concrete estimates we need. We suppose an access to the quantum oracle that maps \(|x\rangle |0\rangle |0\rangle \) to \(|x\rangle |0\rangle |f(x)\rangle \), and \(|x\rangle |1\rangle |0\rangle \) to \(|x\rangle |1\rangle |g(x)\rangle \).

Producing the Labeled Qubits. We begin by constructing the uniform superposition on \(N\times \{0,1\}\): \(\frac{1}{\sqrt{2N}}\sum _{x=0}^{N-1}|x\rangle \left( |0\rangle +|1\rangle \right) |0\rangle \). Then, we apply the quantum oracle, and get

$$\begin{aligned} \frac{1}{\sqrt{2N}}\sum _{x=0}^{N-1}|x\rangle \left( |0\rangle |f(x)\rangle +|1\rangle |g(x)\rangle \right) . \end{aligned}$$

We then measure the final register. We obtain a value \(y = f(x_0) = g(x_0+s)\) for some random \(x_0\). The two first registers collapse on the superposition that corresponds to this measured value: \(\frac{1}{\sqrt{2}}\left( |x_0\rangle |0\rangle +|x_0+s\rangle |1\rangle \right) \).

Finally, we apply a Quantum Fourier Transform (QFT) on the first register and measure it, we obtain a label \(\ell \) and the state

$$\begin{aligned} |\psi _{\ell }\rangle = \frac{1}{\sqrt{2}}\left( |0\rangle +\chi \left( s\frac{\ell }{N}\right) |1\rangle \right) , \chi \left( x\right) = \exp \left( 2i\pi x\right) . \end{aligned}$$

The phase \(\chi \left( s\frac{\ell }{N}\right) \), which depends on s and \(\frac{\ell }{N}\), contains information on s. We now apply a combination routine on pairs of labeled qubits \((|\psi _{\ell }\rangle , \ell )\) as follows.

Combination Step. If we have obtained two qubits \(|\psi _{\ell _1}\rangle \) and \(|\psi _{\ell _2}\rangle \) with their corresponding labels \(\ell _1\) and \(\ell _2\), we can write the (disentangled) joint state of \(|\psi _{\ell _1}\rangle \) and \(|\psi _{\ell _2}\rangle \) as:

$$\begin{aligned} |\psi _{\ell _1}\rangle \otimes |\psi _{\ell _2}\rangle = \frac{1}{2} \left( |00\rangle + \chi \left( s\frac{\ell _1}{N}\right) |10\rangle + \chi \left( s\frac{\ell _2}{N}\right) |01\rangle + \chi \left( s\frac{\ell _1 + \ell _2}{N}\right) |11\rangle \right) . \end{aligned}$$

We apply a CNOT gate, which maps \(|00\rangle \) to \(|00\rangle \), \(|01\rangle \) to \(|01\rangle \), \(|10\rangle \) to \(|11\rangle \) and \(|11\rangle \) to \(|10\rangle \). We obtain the state:

$$\begin{aligned} \frac{1}{2} \left( |00\rangle + \chi \left( s\frac{\ell _2}{N}\right) |01\rangle + \chi \left( s\frac{\ell _1 + \ell _2}{N}\right) |10\rangle + \chi \left( s\frac{\ell _1}{N}\right) |11\rangle \right) . \end{aligned}$$

We measure the second qubit. If we measure 0, the first qubit collapses to:

$$\begin{aligned} \frac{1}{\sqrt{2}} \left( |0\rangle + \chi \left( s\frac{\ell _1 + \ell _2}{N}\right) |1\rangle \right) = |\psi _{\ell _1 + \ell _2} \rangle \end{aligned}$$

and if we measure 1, it collapses to:

$$\begin{aligned} \frac{1}{\sqrt{2}} \left( \chi \left( s\frac{\ell _2}{N}\right) |0\rangle + \chi \left( s\frac{\ell _1}{N}\right) |1\rangle \right) = \chi \left( s\frac{\ell _2}{N}\right) |\psi _{\ell _1 - \ell _2} \rangle . \end{aligned}$$

A common phase factor has no incidence, so we can see that the combination either produces \(|\psi _{\ell _1 + \ell _2} \rangle \) or \(|\psi _{\ell _1 - \ell _2} \rangle \), with probability \(\frac{1}{2}\). Furthermore, the measurement of the first qubit gives us which of the labels we have obtained. Although we cannot choose between the two cases, we can perform favorable combinations: we choose \(\ell _1\) and \(\ell _2\) such that \(\ell _1 \pm \ell _2\) is a multiple of 2 with greater valuation than \(\ell _1\) and \(\ell _2\) themselves.

Goal of the Combinations. In order to retrieve s, we want to produce the qubits with label \(2^i\) and apply a Quantum Fourier Transform. Indeed, we have

$$\begin{aligned} QFT\bigotimes _{i=0}^{n-1}|\psi _{2^i}\rangle = \frac{1}{2^{n/2}}QFT\sum _{k=0}^{2^n-1}\chi \left( \frac{ks}{N}\right) |k\rangle \qquad \qquad \qquad \qquad \qquad \\ = \frac{1}{2^{n}} \sum _{t=0}^{2^n-1}\left( \sum _{k=0}^{2^n-1}\chi \left( k\left( \frac{s}{N}+\frac{t}{2^n}\right) \right) \right) |t\rangle . \end{aligned}$$

The amplitude associated with t is \(\frac{1}{2^{n}}\left| \frac{1-\chi \left( 2^n\left( \frac{s}{N}+\frac{t}{2^n}\right) \right) }{1-\chi \left( \frac{s}{N}+\frac{t}{2^n}\right) }\right| \). If we note \(\theta = \frac{s}{N}+\frac{t}{2^n}\), this amplitude is \(\frac{1}{2^{n}}\left| \frac{\sin (2^n\pi \theta )}{\sin (\pi \theta )}\right| \). For \(\theta \in \left[ 0;\frac{1}{2^{n+1}}\right] \), this value is decreasing, from 1 to \(\frac{1}{2^n\sin {\frac{\pi }{2^{n+1}}}}\simeq \frac{2}{\pi }\). Hence, when measuring, we obtain a t such that \(\left| \frac{s}{N}+\frac{t}{2^n}\right| \le \frac{1}{2^{n+1}}\) with probability greater than \(\frac{4}{\pi ^2}\). Such a t always exists, and uniquely defines s if \(n > \log _2(N)\).

From \(2^n\) to any N. We want to apply this simple algorithm to any cyclic group, with any N. A solution is to not take into account the modulus N in the combination of labels. We only want combinations such that \(\sum _{k} \pm \ell _k = 2^i\). At each combination step, we expect the 2-valuation of the output label to increase (we collide on the lowest significant bits), but its maximum size can also increase: \(\ell _1 + \ell _2\) is bigger than \(\ell _1\) and \(\ell _2\). However, the size can increase of at most one bit per combination, while the lowest significant 1 position increases on average in \(\sqrt{n}\). Hence, the algorithm will eventually produce the correct value.

We note \(\text {val}_2(x) = \max _{i} 2^i|x\) the 2-valuation of x. The procedure is Algorithm 2. Each label is associated to its corresponding qubit, and the operation ± corresponds to the combination.

figure b
Table 1. Simulation results for Algorithm 2, for \(90\%\) success

Intuitively, the behavior of this algorithm will be close to the one of [10], as we only have a slightly higher amplitude in the values, and a few more elements to produce. The number of oracle queries Q is exactly the number of labeled qubits used during the combination step. Empirically, we only need to put 3 elements at each step in R in order to have a good success probability. This algorithm is easily simulated, because we only need to reproduce the combination step, by generating at random the new labels obtained at each combination. We estimate the total number of queries to be around \(12\times 2^{1.8\sqrt{n}}\) (Table 1).

For the CSIDH parameters of [4], we have three group sizes (in bits): \(n = 256\), 512 and 896 respectively. We obtain \(2^{33}\), \(2^{45}\) and \(2^{58}\) oracle queries to build the labeled qubits, with \(2^{31}\), \(2^{43}\) and \(2^{56}\) qubits to store in memory. A slight overhead in time stems from the probability of success of \(\frac{4}{\pi ^2}\); the procedure needs to be repeated at most 4 times. In CSIDH, the oracle has a high gate complexity. The number of CNOT quantum gates applied during the combination step (roughly equal to the number of labeled qubits at the beginning) is negligible. Notice also that the production of the labeled qubits can be perfectly parallelized.

3.3 An Approach Based on Subset-sums

Algorithm 2 is only a variant of the first subexponential algorithm by Kuperberg in [31]. We develop here on a later approach used by Regev [39] and Childs, Jao and Soukharev [16] for odd N.

Subset-sum Combination Routine. This algorithm uses the same labeled qubits as the previous one. The main idea is to combine not 2, but k qubits:

$$\begin{aligned} \bigotimes _{i \le k}|\psi _{\ell _i}\rangle = \sum _{j \in \{0,1\}^k} \chi \left( \frac{ j \cdot (\ell _1,\dots ,\ell _k)}{N} s\right) |j\rangle \end{aligned}$$

and apply \(|x\rangle |0\rangle \mapsto |x\rangle |\lfloor x \cdot (\ell _1,\dots ,\ell _k)/B\rfloor \rangle \) for a given B that controls the cost of the combination routine and depends on the tradeoffs of the complete algorithm. Measuring the second register yields a value \( V = \lfloor x \cdot (\ell _1,\dots ,\ell _k)/B\rfloor \), the state becoming

$$\begin{aligned} \sum _{\lfloor j \cdot (\ell _1,\dots ,\ell _k)/B\rfloor = V} \chi \left( \frac{ j \cdot (\ell _1,\dots ,\ell _k)}{N} s\right) |j\rangle . \end{aligned}$$

In order to get a new labeled qubit, one can simply project on any pair \((j_1, j_2)\) with \(j_1\) and \(j_2\) among this superposition of j. This is easy to do as long as the j are classically known. They can be computed by solving the equation \(\lfloor j \cdot (\ell _1,\dots ,\ell _k)/B\rfloor = V\), which is an instance of the subset-sum problem.

This labeled qubit obtained is of the form:

$$\begin{aligned} \chi \left( \frac{ j_1 \cdot (\ell _1,\dots ,\ell _k)}{N} s\right) |j_1\rangle + \chi \left( \frac{ j_2 \cdot (\ell _1,\dots ,\ell _k)}{N} s\right) |j_2\rangle \end{aligned}$$

which, up to a common phase factor, is:

$$\begin{aligned} |j_1\rangle + \chi \left( \frac{ (j_2-j_1) \cdot (\ell _1,\dots ,\ell _k)}{N} s\right) |j_2\rangle . \end{aligned}$$

We observe that the new label in the phase, given by \((j_2-j_1) \cdot (\ell _1,\dots ,\ell _k)\), is less than B. If we map \(j_1\) and \(j_2\) respectively to 0 and 1, we obtain a labeled qubit \(|\psi _\ell \rangle \) with \(\ell < B\). Now we can iterate this routine in order to get smaller and smaller labels, until the label 1 is produced. If N is odd, one reaches the other powers of 2 by multiplying all the initial labels by \(2^{-a}\) and then applying normally the algorithm.

figure c

There are \(2^k\) sums, and N/B possible values, hence we can expect to have \(2^kB/N\) solutions. If we take \(k \simeq \log _2(N/B)\), we can expect 2 solutions on average. In order to obtain a labeled qubit in the end, we need at least two solutions, and we need to successfully project to a pair \((j_1,j_2)\) if there are more than two solutions.

The case where a single solution exists cannot happen more than half of the time, as there are twice many inputs as outputs. We consider the case where we have strictly more than one index j in the sum. If we have an even number of such indices, we simply divide the indices j into a set of pairs, project onto a pair, and map one of the remaining indexes to 0 and the other to 1. If we have an odd number of such indices, since it is greater or equal than 3, we single out a solitary element, and do the projections as in the even case. The probability to fall on this element is less than \(\frac{1}{t} \le \frac{1}{3}\) if there are t solutions, hence the probability of success in this case is more than \(\frac{2}{3}\).

This combination routine can be used recursively to obtain the label we want.

Linear Number of Queries. Algorithm 3 can directly produce the label 1 if we choose \(k = \lceil \log _2(N)\rceil \) and \(B = 2\). In that case, we will either produce 1 or 0 with a uniform probability, as the input labels are uniformly distributed.

If the group has a component which is a small power of two, the previous routine can be used with \(B = 1\) in order to force the odd cyclic component at zero. Then the algorithms of [10] can be used, with a negligible overhead.

Overall, the routine can generate the label 1 using \(\log _2(N)\) queries with probability one half. This also requires to solve a subset-sum instance, which can be done in only \(\widetilde{O}\left( 2^{0.291\log _2(N)}\right) \) classical time and memory [2].

We need to obtain \(\log _2(N)\) labels, and then we can apply the Quantum Fourier Transform as before, to recover s, with a success probability \(\frac{4}{\pi ^2}\). So we expect to reproduce this final step 3 times. The total number of queries will be \(8\log _2(N)^2\), with a classical time and memory cost in \(\widetilde{O}\left( 2^{0.291\log _2(N)}\right) \).

We note that this variant is the most efficient in quantum resources, as we limit the quantum queries to a polynomial amount. The classical complexity remains exponential, but we replace the complexity of a collision search (with an exponent of 0.5) by that of the subset-sum problem (an exponent of 0.291). In the case \(N \simeq 2^{256}\) (CSIDH-512), by taking into account the success probability of the final Quantum Fourier Transform, we obtain \(2^{19}\) quantum queries and \(2^{86}\) classical time and memory.

Time/Query Tradeoffs. There are many possible tradeoffs, as we can adjust the number of steps and their sizes. For example, we can proceed in two steps: the first step will produce labels smaller than \(\sqrt{N}\), and the second will use them to produce the label 1. The subset-sum part of each step, done classically, will cost \(\widetilde{O}\left( 2^{0.291\log _2(N)/2}\right) \) time and memory, and it has to be repeated \(\log (N)^2/4\) times per label. Hence, the total cost in queries is in \(O(\log (N)^3)\), with a classical time and memory cost in \(\widetilde{O}\left( 2^{0.291\log _2(N)/2}\right) \).

For \(N \simeq 2^{256}\), we can use Algorithm 3 to obtain roughly 130 labels that are smaller than \(2^{128}\), and then apply Algorithm 3 on them to obtain the label 1. We can estimate the cost to be roughly \(2^{24}\) quantum queries, \(2^{60}\) classical time and \(2^{45}\) memory.

This method generalizes to any number of steps. If we want a subexponential classical time, then the number of steps has to depend on N. Many tradeoffs are possible, depending on the resources of the quantum attacker (see [9]).

3.4 Kuperberg’s Second Algorithm

This section revisits the algorithm from [32] and builds upon tradeoffs developed in [9]. We remark that the previous labeled qubits \(|\psi _\ell \rangle \) were a particular case of qubit registers of the form

$$\begin{aligned} |\psi _{(\ell _0,\dots ,\ell _{k-1})}\rangle = \frac{1}{\sqrt{k}}\sum _{0 \le i \le k-1}\chi \left( s\frac{\ell _i}{N}\right) |i\rangle . \end{aligned}$$

These multi-labeled qubit registers become the new building blocks. They are not indexed by a label \(\ell \), but by a vector \((\ell _0,\dots ,\ell _{k-1})\). We can remark that if we consider the joint state (tensor) of j single-label qubits \(|\psi _{\ell _i}\rangle \), we directly obtain a multi-labeled qubit register of this form:

$$\begin{aligned} \bigotimes _{0 \le i \le j-1} |\psi _{\ell _i}\rangle = \left| \psi _{\left( \ell '_0,\dots ,\ell '_{2^j-1}\right) }\right\rangle , \quad \text {with } \ell '_k = k \cdot (\ell _0, \ldots , \ell _{j-1}).\end{aligned}$$
figure d

These registers can again be combined by computing and measuring a partial sum, as in Algorithm 4. While Algorithm 3 was essentially a subset-sum routine, Algorithm 4 is a 2-list merging routine. Step 4 simply consists in iterating trough the sorted lists of \((\ell _0,\dots ,\ell _{M-1})\) and \((\ell '_0,\dots ,\ell '_{M-1})\) to find the matching values (and this is exactly a classical 2-list problem). Hence, it costs \(\widetilde{O}(M)\) classical time, with the lists stored in classical memory. The memory cost is \(\max (M,M')\). The quantum cost comes from the computation of the partial sum and from the relabeling. Both can be done sequentially, in \(O(\max (M,M'))\) quantum time.

This routine can also be generalized to merge more than two lists. The only difference will be that at Step 4, we will need to apply another list-merging algorithm to find all the matching values. In particular, if we merge 4k lists, we can use the Schroeppel-Shamir algorithm [43], to obtain the solutions in \(O(M^{2k})\) classical time and \(O(M^{k})\) classical memory.

Once we are finished, we project the vector to a pair of values with difference 1, as in Algorithm 3, with the same success probability, better than 1/3.

Complete Algorithm. The complete algorithm uses Algorithm 4 recursively. As before, the final cost depends on the size of the lists, the number of steps and the number of lists we merge at each step. Then, we can see the algorithm as a merging tree.

The most time-efficient algorithms use 2-list merging. The merging tree is binary, the number of lists at each level is halved. We can save some time if we allow the lists to double in size after a merging step. In that case, the merging of two lists of size \(2^{m}\) to one list of size \(2^{m+1}\) allows to constrain \(m-1\) bitsFootnote 2, at a cost of \(O(2^{m})\) in classical and quantum time and classical memory. If we have e levels in the tree and begin with lists of size \(2^{\ell _0}\), then the quantum query cost is \(\ell _02^e\). The time cost will be in \(\widetilde{O}\left( 2^{\ell _0+e}\right) \), as the first step is performed \(2^e\) times, the second \(2^{e-1}\) times, and so on.

Allowing the lists to grow saves some time, but costs more memory. To save memory, we can also combine lists and force the output lists to be of roughly the same size. Hence, the optimal algorithm will double the list sizes in the first levels until the maximal memory is reached, when the list size has to stay fixed.

Overall, let us omit polynomial factors and denote the classical and quantum time as \(2^t\). We use at most \(2^m\) memory and make \(2^q\) quantum queries, begin with lists of size \(2^{\ell _0}\) and double the list sizes until we reach \(2^m\). Hence, the list size levels are distributed as in Fig. 1. We have q equal to the number of levels, and t equals the number of levels plus \(\ell _0\). As each level constrains as many bits as the log of its list size, the total amount of bits constrained by the algorithm corresponds to the hatched area.

Fig. 1.
figure 1

Size of the lists in function of the tree level, in \(\log _2\) scale, annotated with the different parameters.

Hence, with \( \max (m,q) \le t \le m+q\), we can solve the hidden shift problem for \(N < 2^n\) with

$$\begin{aligned} -\frac{1}{2}(t-m-q)^2 + mq = n \end{aligned}$$

We directly obtain the cost of \(\widetilde{O}\left( 2^{\sqrt{2n}}\right) \) from [32] if we consider \(t = m = q\).

Classical/Quantum Tradeoffs. The previous approach had the inconvenience of using equal classical and quantum times, up to polynomial factors. In practice, we can expect to be allowed more classical operations than quantum gates. We can obtain different tradeoffs by reusing the previous 2-list merging tree, and seeing it as a \(2^k\)-list merging tree. That is, we see k levels as one, and merge the \(2^k\) lists at once. This allows to use the Schroeppel-Shamir algorithm for merging, with a classical time in \(2^{2^k/2}\) and a classical memory in \(2^{2^k/4}\). This operation is purely classical, as we are computing lists of labels, and it does not impact the quantum cost. Moreover, while we used to have a constraint on \(\log (k)m\) bits, we now have a constraint on \((k-1)m\) bits.

For \(k = 2\), omitting polynomial factors, with a classical time of \(2^{2t}\) and quantum time of \(2^t\), a memory of \(2^m\), a number of quantum queries of \(2^q\) and \( \max (m,q) \le t \le m+q\), we can solve the hidden shift problem for \(N < 2^n\) with

$$\begin{aligned} -\frac{1}{2}(t-m-q)^2 + mq = 2n/3. \end{aligned}$$

In particular, if we consider that \(t = m = q\), we obtain an algorithm with a quantum time and query and classical memory complexity of \(\widetilde{O}(2^{2\sqrt{\frac{n}{3}}})\) and a classical time complexity of \(\widetilde{O}(2^{4\sqrt{\frac{n}{3}}})\), and if we consider that \(t = 2m = 2q\), we obtain a quantum query and classical memory cost in \(\widetilde{O}(2^{\sqrt{\frac{2n}{3}}})\), a classical time in \(\widetilde{O}(2^{4\sqrt{\frac{2n}{3}}})\) and a quantum time in \(\widetilde{O}(2^{2\sqrt{\frac{2n}{3}}})\).

Concrete Estimates. If we consider \(N \simeq 2^{256}\), with the 2-list merging method we can succeed with \(2^{23}\) initial lists of size 2. We double the size of the list at each level until we obtain a list of size \(2^{24}\). In that case, we obtain classical and quantum time cost in \(2^{39}\), a classical memory in \(2^{29}\) and \(2^{34}\) quantum queries.

Using the 4-list merging, we can achieve the same in 10 steps with roughly \(2^{55}\) classical time, \(2^{23}\) classical memory, \(2^{35}\) quantum time, \(2^{31}\) quantum queries.

Other tradeoffs are also possible. We can reduce the number of queries by beginning with larger lists. We can also combine the k-list approach with the subset-sum approach to reduce the quantum time (or the classical memory, if we use a low-memory subset-sum algorithm).

For example, if we consider a 4-level tree, with a 4-list merging, an initial list size of \(2^{24}\) and lists that quadruple in size, the first combination step can constrain \(24\times 3 -2 = 70\) bits, the second \(26\times 3 - 2 = 76\) and the last \(28\times 4-1 = 111\) bits (for the last step, we do not need to end with a large list, but only with an interesting element, hence we can constrain more). We bound the success probability by the success probability of one complete merging (greater than 1/3) times the success probability of the Quantum Fourier Transform (greater than \(\pi ^2/4\)), for a total probability greater than 1/8.

The cost in memory is of \(2^{30}\), as we store at most 4 lists of size \(2^{28}\). For the number of quantum queries: there are \(4^3 = 64\) initial lists in the tree, each costs 24 queries (to obtain a list of \(2^{24}\) labels by combining). We have to redo this 256 times to obtain all the labels we want, and to repeat this 8 times due to the probability of success. Hence, the query cost is \(24\times 64\times 256\times 8 \simeq 2^{22}\). The classical time cost is in \(256\times 8\times 3\times 2^{28\times 2} \simeq 2^{69}\). The quantum time cost is in \(256\times 8\times 3\times 2^{28}\simeq 2^{41}\).

We summarize the results of this section in Table 2.

Table 2. Hidden shift costs tradeoffs that will be used in the following sections. Quantum memory is only the inherent cost needed by the algorithm and excludes the oracle cost. \( n = \log _2(N)\).

4 Reduction in the Lattice of Relations

This section reviews the lattice reduction technique that allows to go from an arbitrary representation of an ideal class [x] to a representation on a basis of arbitrary ideals: \([x] = [\mathfrak {l}_i]^{x_i}\), with short exponents \(x_i\). This allows to turn an oracle for the CSIDH group action, computing \(\prod _i [\mathfrak {l}_i]^{e_i} \cdot E\), into an oracle for the action of \(\mathcal {C}\ell \mathcal {(O)}\).

4.1 The Relation Lattice

Given p and the ideal classes \([\mathfrak {l}_1], \dots , [\mathfrak {l}_u]\), the integer vectors \(\bar{e} = (e_1, \ldots e_u)\) such that \([\mathfrak {l}_1]^{e_1} \ldots [\mathfrak {l}_u]^{e_u} = \mathbf {1}\) form an integer lattice in \(\mathbb {R}^u\), that we denote \(\mathcal {L}\), the relation lattice. This lattice is ubiquitous in the literature on CRS and CSIDH (see [6] or [27] for a CSIDH context).

The lattice \(\mathcal {L}\) depends only on the prime parameter p, hence all computations involving \(\mathcal {L}\) are precomputations. First, we notice that \(\mathcal {L}\) is the kernel of the map: \((e_1, \ldots e_u) \mapsto [\mathfrak {l}_1]^{e_1} \ldots [\mathfrak {l}_u]^{e_u}\). Finding a basis of \(\mathcal {L}\) is an instance of the Abelian Stabilizer Problem, that Kitaev introduces and solves in [28] in quantum polynomial time.

Lattice Reduction. Next, we compute an approximate short basis B and its Gram-Schmidt orthogonalization \(B^*\). All this information about \(\mathcal {L}\) will be stored classically. We compute B using the best known algorithm to date, the Block Korkine Zolotarev algorithm (BKZ) [42]. Its complexity depends on the dimension u and the block size, an additional parameter which determines the quality of the basis. For any dimension u, BKZ gives an approximation factor \(c^u\) for some constant c depending on the block size: \(\left| |b_1|\right| _2 \le c^u \lambda _1(\mathcal {L})\) where \(\lambda _1(\mathcal {L})\) is the euclidean norm of the smallest vector in \(\mathcal {L}\). In our case, assuming that the products \([\mathfrak {l}_i]^{e_i}\) with \(-m \le e_i \le m\) span the whole class group, one of these falls on \(\mathbf {1}\) and we have: \(\lambda _1(\mathcal {L}) \le 2m\sqrt{u}\).

4.2 Solving the Approximate CVP with a Reduced Basis

In this section, we suppose that a product \(\prod _i [\mathfrak {l}_i]^{t_i}\) for some large \(t_i\) is given (possibly as large as the cardinality of the class group, hence \(O(\sqrt{p})\)). In order to evaluate the action of \(\prod _i [\mathfrak {l}_i]^{t_i}\), we would like to reduce \(\bar{t} = t_1, \ldots t_u\) to a vector \(\bar{e} = e_1, \ldots e_u\) with small norm, such that \(\prod _i [\mathfrak {l}_i]^{e_i} = \prod _i [\mathfrak {l}_i]^{t_i}\). In other words, we want to solve the approximate closest vector problem (CVP) in \(\mathcal {L}\): given the target \(\bar{t}\), we search for the closest vector \(\bar{v}\) in \(\mathcal {L}\) and set \(\bar{e} = \bar{v} - \bar{t}\).

Babai’s Algorithm. The computation of a short basis B of \(\mathcal {L}\) has to be done only once, but the approximate CVP needs to be solved on the fly and for a target \(\bar{t}\) in superposition. As in [7], we use a simple polynomial-time algorithm, relying on the quality of the basis B: Babai’s nearest-plane algorithm [1]. We detail it in the full version of the paper [11]. Given the target vector \(\bar{t}\), B and its Gram-Schmidt orthogonalization \(B^\star \), this algorithm outputs in polynomial time a vector \(\bar{v}\) in the lattice \(\mathcal {L}\) such that \(\left| |\bar{v} - \bar{t}|\right| _2 \le \frac{1}{2} \sqrt{\sum _{i=1}^u \left| |b_i^\star |\right| _2^2 }\). This bound holds simultaneously for every target vector \(\bar{t}\) and corresponding output \(\bar{v}\) (as \(\bar{t}\) will actually be a superposition over all targets, this is important for us).

Effect on the \(L_1\) Norm. Our primary concern is the number of isogenies that we compute, so we will measure the quality of our approximation with the \(L_1\) norm of the obtained \(\bar{e} = \bar{v} - \bar{t}\). The bound on the \(L_1\) norm is: \(\left| |\bar{v} - \bar{t}|\right| _1 \le \sqrt{u} \left| |\bar{v} - \bar{t}|\right| _2 = \frac{\sqrt{u}}{2} \sqrt{\sum _{i=1}^u \left| |b_i^\star |\right| _2^2} \). Naturally, if we manage to solve the exact CVP, and obtain always the closest vector to \(\bar{t}\), any evaluation of \([x] \cdot E_A\) will cost exactly the same as an evaluation of \(\prod _i [\mathfrak {l}_i]^{e_i} \cdot E_A\) with the bounds on the exponents \(e_i\) specified by the CSIDH parameters; hence the class group action collapses to the CSIDH group action.

Our Simulations. We performed simulations by modeling \(\mathcal {C}\ell \mathcal {(O)}\) as a cyclic group of random cardinality \(q \simeq \sqrt{p}\). Then we take u elements at random in this group, of the form \(g^{a_i}\) for some generator g and compute two-by-two relations between them, as: \((g^{a_i})^{a_{i+1}} \cdot (g^{a_{i+1}})^{-a_i} = \mathbf {1}\). With such a basis, the computational system Sage [46] performs BKZ reduction with block size 50 in a handful of minutes, even in dimension 200. We compute the \(L_1\) bound \(\frac{\sqrt{u}}{2} \sqrt{\sum _{i=1}^u \left| |b_i^\star |\right| _2^2}\) for many lattices generated as above, reduced with BKZ-50. We obtain on average, for CSIDH -512, -1024 and -1792 (of dimensions 74, 132 and 209 respectively), 1300, 4000 and 10000. The standard deviation of the values found does not exceed 10%. Notice that the bound is a property of the lattice, so we can take the average here, even though we will apply Babai’s algorithm to a superposition of inputs.

Faster Evaluations of the Class Group Action. In the context of speeding up the classical group action, the authors of [5] computed the structure of the class group for CSIDH-512, the relation lattice and a small basis of it. They showed that the class group was cyclic. Given an ideal class [x], they use Babai’s algorithm with another refinement [23]. It consists in keeping a list of short vectors and adding them to the output of Babai’s algorithm, trying to reduce further the \(L_1\) norm of the result.

In particular for CSIDH-512, they are able to compute vectors of \(L_1\) norm even shorter on average than the original bound of \(5 \times 74 = 370\), reaching an average 240 with BKZ-40 reduction. This suggests that, with lattice reduction, there may be actually less isogenies to compute than in the original CSIDH group action. However, we need a bound guaranteed for all target vectors, since we are computing in superposition, which is why we keep the bounds of above.

5 A Quantum Circuit for the Class Group Action

In this section, we first analyze the cost of a quantum circuit that evaluates the CSIDH group action on a given Montgomery curve \(E_A\) represented by \(A \in \mathbb {F}_{p}\):

$$\begin{aligned} |e_1, \ldots e_u\rangle |A\rangle |0\rangle \mapsto |e_1, \ldots e_u\rangle |A\rangle |L_{\ell _1}^{e_1} \circ \ldots \circ L_{\ell _u}^{e_u}(A)\rangle \end{aligned}$$

where \(L_{\ell _i}\) corresponds to applying \([\mathfrak {l}_i]\) to a given curve, and the \(e_i\) are possibly greater than the CSIDH original exponents. We will then move to the class group action, which computes \([x] \cdot E_A\) in superposition for any [x].

Following previous literature on the topic [4, 41], we count the number of Toffoli gates and logical qubits used, as both are considered the most determinant factors for implementations. Our goal is to give an upper bound of resources for CSIDH-512 and an estimate for any CSIDH parameters, given a prime p of n bits and the sequence of small primes \(\ell _i\) such that \(p = 4 \cdot \prod _i \ell _i - 1\).

It was shown in [27] that the group action could be computed in polynomial quantum space. A non-asymptotic study of the gate cost has been done in [4]. However, the authors of [4] were concerned with optimizing a classical circuit for CSIDH, without reversibility in mind. This is why the appendix of [4], mentions a bewildering amount of “537503414” logical qubits [4, Appendix C.6] (approx. \(2^{29}\)). In this section, we will show that the CSIDH-512 group action can be squeezed into 40 000 logical qubits.

We adopt a bottom-up approach. We first introduce some significant tools and components, then show how to find, on an input curve \(E_A\), a point that generates a subgroup of order \(\ell \). We give a circuit for computing an isogeny, a sequence of isogenies, and combine this with lattice reduction to compute the class group action.

5.1 Main Tools

Bennett’s Conversion. One of the most versatile tools for converting an irreversible computation into a reversible one is Bennett’s time-space tradeoff [3]. Precise evaluations were done in [30, 34].

Assume that we want to compute, on an input x of n bits, a sequence \(f_{t-1} \circ \ldots \circ f_0(x)\), where each \(f_i\) can be computed out of place with a quantum circuit using \(T_f\) Toffoli gates and a ancilla qubits: \(|x\rangle |0\rangle \mapsto |x\rangle |f_i(x)\rangle \). We could naturally compute the whole sequence using tn ancilla qubits, but this rapidly becomes enormous. Bennett remarks that we can separate the sequence \(f_{t-1} \circ \ldots \circ f_0 = G \circ F\), with F and G functions using \(m_F\) and \(m_G\) ancillas respectively, and compute:

If \(T_F\) and \(T_G\) are the respective Toffoli counts of the circuits for F and G, the total is \(2T_F + T_G\) and the number of ancillas used is \(\max (m_F, m_G) + n\). Afterwards, we cut F and G recursively. Bennett obtains that for any \(\epsilon > 0\), an irreversible circuit using S space and running in time T can be converted to a reversible circuit running in time \(T^{1+ \epsilon }\) and using \(O(S \log T)\) space.

Adding One More Step. It often happens for us that the final result of the \(f_i\)-sequence is actually not needed, we need only to modify the value of another one-bit register depending on \(f_{t-1} \circ \ldots \circ f_0(x)\) (for example, flipping the phase). This means that at the highest level of the conversion, all functions are actually uncomputed. This can also mean that we do not compute \(f_{t-1} \circ \ldots \circ f_0(x)\), but \(f \circ f_{t-1} \circ \ldots \circ f_0(x)\) for some new f. Hence the cost is the same as if we added one more step before the conversion, and often negligible.

Number of Steps Given a Memory Bound. We want to be as precise as possible, so we follow [30]. In general, we are free to cut the t operations in any way, and finding the best recursive way, given a certain ancilla budget, is an optimization problem. Let B(ts) be the least number of computation steps, for a total Toffoli cost \(B(t, s) T_f\), given \(s n + m\) available ancilla qubits, to obtain reversibly \(f_{t-1} \circ \ldots \circ f_0(x)\) from input x. We have:

Theorem 1

(Adaptation of [30], Theorem 2.1). B(ts) satisfies the recursion:

$$\begin{aligned} B(t,s) = \left\{ \begin{matrix} 1 \text { for } t = 1 \text { and } s \ge 0 \\ \infty \text { for } t \ge 2 \text { and } s = 0 \\ \min _{1 \le k < t} B(k,s) + B(k,s-1) + B(t-k, s-1) \text { for } t \ge 2 \text { and } s \ge 1 \\ \end{matrix} \right. \end{aligned}$$

In all the costs formulas that we write below, we add a trade-off parameter s in the memory used and B(ts) in the time.

Basic Arithmetic Modulo p. The Toffoli cost of the group action oracle is almost totally consumed by arithmetic operations modulo p (a prime of n bits), and in the following, we count the time in multiples of these basic operations. We do not make a difference between multiplication and squaring, as we use a single circuit for both, and denote \(T_M\) the Toffoli gate count of a multiplication in \(\mathbb {F}_{p}\), using \(Q_M\) ancilla qubits. We also denote \(T_I\) the Toffoli count of an inversion and \(Q_I\) its ancilla count. As n will remain the same parameter throughout this section, we deliberately omit it in these notations, although \(T_M, T_I, Q_I, Q_M\) are functions of n. Note that [4] considers that the inversion modulo p costs an n-bit exponentiation, far more than with the circuit of [41].

Lemma 1

([41], Table 1). There is a quantum circuit for (out of place) inversion modulo a prime p of n bits: \( |x\rangle |0\rangle \mapsto |x\rangle |x^{-1} \mod p\rangle \) that uses \(T_I = 32n^2 \log _2 n\) Toffoli gates and \(Q_I = 5n + 2\lceil {\log _2 n}\rceil + 7 \) qubits.

This circuit is out of place: the input registers are left unchanged, and the result is written on an n-bit output register. Circuits for in-place modular addition and doubling are also given in [41] and their Toffoli counts remain in \(O\left( n \log _2 n \right) \), hence negligible with respect to the multiplications.

We use the best modular multipliers given in [40] with 3n qubits and \(4n^2\) Toffoli gates (dismissing terms of lower order). Note that, although the paper is focused on in-place multiplication by a classically known Y (i.e. computing \(|x\rangle \mapsto |x Y\rangle \)), the same resource estimations apply to the out-of-place multiplication of two quantum registers: \(|x\rangle |y\rangle |0\rangle \mapsto |x\rangle |y\rangle |xy\rangle \) (see [40, Section 2.5]). Implementing a controlled multiplication (an additional register chooses to apply it or not) is not much more difficult than a multiplication.

In-place Multiplication. The in-place multiplication: \( |x\rangle |y\rangle \mapsto |x\rangle |x \cdot y\rangle \) is not reversible if x is not invertible, and in this case, we can simply rewrite \(|y\rangle \) in the output register. We reuse the modular inversion circuit of [41] to compute \(|x^{-1}\rangle \). Then we compute \(|x \cdot y\rangle \) and erase the \(|y\rangle \) register by computing \(|x \cdot y \cdot x^{-1}\rangle \).

Lemma 2

(In-place multiplication). There is a circuit that on input \(|x\rangle |y\rangle \) returns \(|x\rangle |x \cdot y\rangle \) if x is invertible and \(|x\rangle |y\rangle \) otherwise. It uses \(T_M' = 2T_M + 2T_I\) Toffoli gates and \(Q_M' = Q_I + n\) ancillas.

Modular Exponentiation. Given a t-bit exponent m, we write \(m = \sum _{i=0}^{t-1} m_i 2^i\). We give a circuit that maps \(|m\rangle |x\rangle |0\rangle \) to \(|m\rangle |x\rangle |x^m\rangle \). Contrary to the modular exponentiation in Shor’s algorithm, in our case, both x and m are quantum, which means that we cannot classically precompute powers of x (see e.g. [41]).

We use a simple square-and-multiply approach with Bennett’s time-space tradeoff. We perform t steps requiring each a squaring and a controlled multiplication by x: on input \(|y\rangle |0\rangle |0\rangle \), we compute \(|y\rangle |x \cdot y\rangle |0\rangle \) then \(|y\rangle |x \cdot y\rangle |0\rangle \), then \(|y\rangle |x \cdot y\rangle |(x \cdot y)^2\rangle \) and erase the second register with another multiplication. Hence a single step uses \(3T_M\) Toffolis and \(Q_M + n\) ancillas.

Lemma 3

There is a quantum circuit for t-bit modular exponentiation (with quantum input x and m) using \(3 B(t,s) T_M\) Toffolis and \((s+1) n + Q_M\) ancillas, where s is a trade-off parameter.

Legendre Symbol. The Legendre symbol of x modulo p is 1 if x is a square modulo p, \(-1\) if not, 0 if x is a multiple of p. It can be computed as \(x^{(p-1)/2} \mod p\). We deduce from Lemma 3, for an n-bit p, a cost of \(3 B(n,s) T_M\) Toffolis and \((s+1) n + Q_M\) ancillas for any trade-off parameter s.

Reversible Montgomery Ladder. Most of the work in the group action oracle is spent computing the (x-coordinate of the) m-th multiple of a point P on a Montgomery elliptic curve given by its coefficient A, for a quantum input m. Following the presentation in [4, Section 3.3], made reversible and combined with Bennett’s time-space tradeoff, we prove Lemma 4 in the full version of the paper [11]. Notice that mP can be transformed back to affine coordinates with little overhead, since the inversion in \(\mathbb {F}_{p}\) costs \(T_I = O\left( n^2 \log n \right) \) Toffolis.

Lemma 4

There exists a circuit to compute, given A, on input P (a point in affine coordinates) and m (an integer of t bits), the x-coordinate of mP (in projective coordinates), using \(15 B(t,s) T_M\) Toffolis and \(Q_M + 2n + 4sn\) ancilla qubits, where s is a trade-off parameter.

5.2 Finding a Point of Order \(\ell \)

Given A in input, we want to compute \(B = L_\ell (A)\), the coefficient of the curve \(\ell \)-isogenous to A. This requires to find a subgroup of order \(\ell \) of the curve \(E_A\). In CSIDH, this is done by first finding a point P on \(E_A\), then computing \(Q = ((p+1)/\ell )P\). if Q is not the point at infinity, it generates a subgroup of order \(\ell \).

Quantum Search for a Good Point. Let \(\text {test}(x)\) be a function that, on input \(x \in \mathbb {F}_{p}^*\), returns 1 if x is the x-coordinate x of such a good point P, and 0 otherwise. We will first build a quantum circuit that on input A and \(x \in \mathbb {F}_{p}^*\), flips the phase: \( |A\rangle |x\rangle \mapsto (-1)^{\text {test}(x)} |A\rangle |x\rangle \). We will use this circuit as a test in a modified Grover search.

Testing if P is on the Curve. We compute \(x^3 + Ax^2 + x\) using some multiplications and squarings (a negligible amount), then the Legendre symbol of \(x^3 + Ax^2 + x\). For exactly half of \(\mathbb {F}_{p}^*\), we obtain 1, which means that x is the x-coordinate of a point on the curve. For the other half, we obtain \(-1\), and x is actually the x-coordinate of a point on its twist.

Multiplication by the Cofactor. Assume that the x-coordinate obtained above is that of a point P on the curve. We compute \(Q = ((p+1)/\ell ) P\) using our reversible Montgomery ladder. Then, another failure occurs if \(Q = \infty \). This happens with probability \(1 / \ell \). Hence, the probability of success of the sampling-and-multiplication operation is \(\frac{1}{2} \left( 1 - \frac{1}{\ell } \right) \). In the circuit that we are building right now, we don’t need the value of Q, only the information whether \(Q = \infty \) or not. Bennett’s conversions of both the Legendre symbol computation and the Montgomery ladder can take into account the fact that we merely need to flip the phase of the input vector.

Lemma 5

There exists a quantum circuit that, on input \(|A\rangle |x\rangle \), flips the phase by \((-1)^{\text {test}(x)}\), using \(15 B(n,s) T_M + 3B(n,s') T_M\) Toffolis and \(\max (Q_M + 2n + 4s n, (s'+1)n + Q_M)\) ancillas, where s and \(s'\) are trade-off parameters.

With this phase-flip oracle, we can obtain a point of order \(\ell \) with a quantum search. Instead of using Elligator as proposed in [4], we follow the “conventional” approach outlined in [4, Section 4.1], not only because it is simpler, but also because its probability of success is exactly known, which makes the search operator cheaper. More details are given in the full version of the paper [11].

Quantum Search with High Success Probability. We start by generating the uniform superposition \(\sum _{x \in \mathbb {F}_{p}^*} |x\rangle \) using a Quantum Fourier Transform (this is very efficient with respect to arithmetical operations). We use a variant of amplitude amplification for the case where the probability of success is high [15]. This variant is exact, but requires to use a phase shift whose angle depends on the success probability.

We know that the proportion of good x is exactly \(g = \frac{1}{2} \left( 1 - \frac{1}{\ell } \right) \). Normally, a Grover search iteration contains a phase flip and a diffusion transform which, altogether, realize an “inversion about average” of the amplitudes of the vectors in the basis. In [15], this iteration is modified into a controlled-phase operator which multiplies the phase of “good vectors” by \(e^{i \gamma }\) instead of \(-1\) and a “\(\beta \)-phase diffusion transform”. Then by [15, Theorem 1], if \(\frac{1}{4} \le g \le 1\) and we set \(\beta = \gamma = \arccos ( 1- 1/(2g))\), the amplitude of the “bad” subspace is reduced to zero. Such a phase shift can be efficiently approximated with the Solovay-Kitaev algorithm [19]. For a phase shift gate synthesized from Clifford+T gates, we estimate from [29] that it can be approximated up to an error of \(2^{-50}\) using around \(2^{14}\) T-gates, which is negligible compared to the cost of the exponentiation in the test function.

Detecting the Errors. If the error probability is low enough, we can assume that the end state is perfect. However, we can avoid these errors if, after computing the superposition of good points, we reapply the test function, add the result in an ancilla qubit and measure this qubit. In general, such a measurement could disrupt the computation. This is not the case here: measuring whether x is a good point for A, while A is in superposition, does not affect the register A, as the set of good points is always of the same size. With probability \(\ge 1 -2^{-50}\) we measure 1 and the state collapses to the exact superposition of good points for the given A. Otherwise we stop the procedure here. When we need to uncompute this procedure, we revert the same single-iteration quantum search and perform the same measurement, with the same success probability.

Lemma 6

There exists a quantum procedure that, on input (affine) A, finds the x-coordinate x of a “good” point on \(E_A\): \( |A\rangle |0\rangle \mapsto |A\rangle \left( \sum _x |x\rangle \right) \). It uses \(30 B(n,s) T_M + 6B(n,4s) T_M\) Toffolis and \(Q_M + 2n + 4sn\) ancillas, and its probability of failure is less than \(2^{-50}\).

Proof

This procedure runs as follows (we say “procedure” instead of “circuit”, since it contains a measurement):

  • Compute the superposition of points \(S = \sum _{x \in \mathbb {F}_{p}^*} |x\rangle \);

  • Apply the modified Grover operator: it contains the computation of S (negligible) and the computation of \(|x\rangle \mapsto \left( e^{i \gamma }\right) ^{\text {test}(x)} |x\rangle \)

  • We actually do not obtain a single x, but a superposition close to the superposition of suitable x

  • Recompute the test in a single-bit ancilla register: \(|x\rangle |0\rangle \mapsto |x\rangle |\text {test}(x)\rangle \)

  • Measure the ancilla register, forcing a collapse on the exact superposition of suitable x.

We set \(s' = 4s\) in Lemma 5. All in all, we use two Legendre symbol computations and two n-bit reversible Montgomery ladders.   \(\square \)

5.3 Computing an Isogeny

From the x-coordinate of a point Q on \(E_A\) of order \(\ell \), we can compute the coefficient B of the \(\ell \)-isogenous curve \(E_B\). The details are in the full version of the paper [11].

Lemma 7

(Isogeny from point). There is a circuit that on input \( |A\rangle |Q\rangle |0\rangle \), computes \(|A\rangle |Q\rangle |B\rangle \) using \(Q_I + (4s + 9)n\) ancilla qubits and

$$\begin{aligned} 7 B \left( \frac{\ell -1}{2} + 1 , s\right) T_M + 6B( \lceil {\log _2 \ell }\rceil , 4s) T_M + (4\ell -1)T_I + (4\ell +3)T_M \end{aligned}$$

Toffolis, where s is a tradeoff parameter.

We now put together the last subsections in order to perform an \(\ell \)-isogeny mapping: \( |A\rangle |0\rangle \mapsto |A\rangle | L_{\ell } (A)\rangle \) with overwhelming probability of success and detectable failure. We suppose that the cofactor \((p+1)/\ell \) has been classically precomputed. The isogeny computation is performed as follows:

  1. 1.

    On input \(|A\rangle \), produce the superposition of good points P, that are on \(E_A\) and have order \(p+1\) (detectable failures happen here)

  2. 2.

    On input \(|A\rangle |P\rangle \), compute a reversible Montgomery ladder to obtain \(Q = ((p+1)/\ell )P\)

  3. 3.

    On input \(|A\rangle |Q\rangle \), obtain the coefficient \(B = L_{\ell } (A)\) of the image curve

  4. 4.

    Uncompute the Montgomery ladder for Q

  5. 5.

    Uncompute the superposition of good points (detectable failures happen here)

The ancilla cost of an out of place isogeny computation is the maximum between \(n + Q_M + 2n + 4sn\) (computing the good points and the ladder for \(Q = ((p+1)/\ell )P\)) and \(n + Q_I + (4s' + 9)n\) (computing the image curve). We set \(s = s'\) in order to use \(Q_I + (4s + 11)n\) ancillas at most. Next, we denote \(T_\ell (s)\) the Toffoli count of this operation. It sums \(60B(n,s) + 12B(n,4s)T_M\) (computing the good points), the cost of Lemma 7 and \(30B(n,s) T_M\) (computing the ladder).

Computing the inverse of an isogeny is not difficult, as noticed in [4], as we have \(L_{\ell }^{-1}(A) = - L_{\ell }(-A)\). Hence, by doubling the cost, we are able to compute isogenies in place. On input \(|A\rangle \), we compute \(|A\rangle |L_{\ell } (A)\rangle \), then we compute \(L_{\ell }^{-1}\) to erase \(|A\rangle \). We will see that most of the computation is spent computing the 12 reversible Montgomery ladders \(P \mapsto ((p+1)/\ell )P\).

Lemma 8

There exists a quantum procedure that performs an \(\ell \)-isogeny mapping in place: \( |A\rangle \mapsto |L_{\ell } (A)\rangle \) with an overwhelming probability of success (\(\le 2^{-50}\)) and detectable failure using \(2 T_\ell (s)\) Toffolis and \(Q_I + (4s + 11)n\) ancillas.

5.4 Computing a Sequence of Isogenies

Using the computation in place of \(L_{\ell _i}\), we now compute the image of an input A by a sequence of isogenies, described by \(\bar{e} = e_1, \ldots e_u\):

$$\begin{aligned} |e_1, \ldots e_u\rangle |A\rangle \mapsto |e_1, \ldots e_u\rangle |L_{\ell _1}^{e_1} \circ \ldots \circ L_{\ell _u}^{e_u}(A)\rangle . \end{aligned}$$

If we need to apply the backwards and not the forwards isogeny (\(e_i\) is negative), we apply \(L_{\ell _i}^{-1}(A) = - L_{\ell _i}(-A)\), so we just need to change the signs of the registers, in place, with negligible overheads (in computations and qubits). In general, contrary to the standard CSIDH key-exchange, we do not have a guarantee on \(\max _i e_i\). Instead, we only know that \(\Vert \bar{e} \Vert _1 = \sum _i |e_i| \le M\) for some bound M. We follow the idea of [4] of having a single quantum circuit for any \(\ell _i\)-isogeny computation, controlled by which isogeny we want to apply. Given an input vector \(e_1, \ldots e_u\), we apply isogenies one by one by decrementing always the top nonzero exponent (or incrementing it, if it is negative).

Since the procedure for the isogeny sequence considers all cases in superposition, it will always apply exactly M controlled isogenies, depending only on the promised bound M. Contrary to modular exponentiation, we don’t need a time-space tradeoff for this sequence of computations, as isogenies can be computed in place (Lemma 8). Finally, if single isogenies fail with probability f, the total failure probability is lower than Mf.

A Constant Success Probability is Enough. The success probability \(2^{-50}\) given Lemma 8 is actually more than enough. Indeed, failures are detected and failed oracle queries can be discarded. One should note that the quantum hidden shift algorithms that apply to the cryptanalysis of CSIDH precisely allow this, since they start by applying the oracle many independent times before combining the results. Before the combination step, we can discard all the failed queries and the complexity is only multiplied by \(1/(1-(Mf))\). Hence, compared to [4], we do not only obtain a better success probability in a simpler way using quantum search, but we also reduce considerably the required success rate. In our case, we expect \(M \lll 2^{50}\), a negligible failure probability, hence a negligible overhead.

Finally, we can transform the CSIDH group action into the class group action, using the lattice reduction technique of Sect. 4. We show in the full version of the paper [11] that, using [27] and Babai’s algorithm together, we can achieve a negligible computational and memory overhead.

Lemma 9

(Group action). Let M be the \(L_1\) bound obtained by reducing the lattice of relations. Assume that \(M \lll 2^{50}\) and \(\ell \) is the maximal small prime used. Then there exists a quantum circuit for the class group action using \(2M T_\ell (s)\) Toffolis and \(Q_I + (4s + 11)n\) ancillas, where s is an integer trade-off parameter, with negligible probability of failure.

6 Estimating the Security of CSIDH Parameters

In this section, we assess the quantum security of the original parameters proposed in [12]. We count the number of T-gates necessary to attack CSIDH and compare to the targeted security levels.

6.1 Cost of the Group Action Oracle

In CSIDH-512, the base prime p has \(n = 511\) bits, and there are \(u = 74\) small primes whose maximum is \(\ell = 587\). We will first count the number of Toffoli gates required in terms of \(T_M\) and \(T_I\), before plugging the cost of a reversible multiplication modulo p.

In Sect. 4, we have estimated that Babai’s algorithm would return a vector of \(L_1\) norm smaller than 1300. Hence, the oracle of Lemma 9 needs to apply \(M = 1300\) in-place isogenies, more than the \(74 \cdot 5 = 370\) required by the “legitimate” group action. We choose \(s = 15\) in Lemma 9. Using Lemma 1, we compute \(B(512,15) = 3553\) and \(B(512,60) = 1925\). We further have \(\lceil {\log _2 \ell }\rceil = 10\) and \(B(10,60) = 17\), \((\ell +1)/2 = 294\) and \(B(294,15) = 1809\). For a single in-place isogeny, the number of multiplications is: \(639540 = 2^{19.3}\) for the Montgomery ladders, 46200 for the Legendre symbols, 30232 for computing the isogeny from a point, and there are 4694 inversions. For 1300 isogenies, we need \(2^{29.8}\) multiplications, among which \(2^{29.6}\) for the Montgomery ladders. There are approximately 38912 ancillas. A 512-bit multiplication costs \(2^{20}\) Toffoli [40], hence the 512-bit class group action can be performed with \(2^{49.8}\) Toffoli gates, i.e. \(2^{52.6}\) T-gates.

Time Complexity for CSIDH-1024 and CSIDH-1792. Since the time is dominated by the Montgomery ladders, and \(Q_I \simeq 5n\), we simplify the Toffoli cost of an isogeny into \(180B(n,s) T_M\) and the ancilla cost into \((4s + 16)n\). We compute B(ns) for various values of s and propose the trade-offs of Table 3.

Table 3. Quantum time and qubits for the class group action for the original CSIDH parameters (computed with the simplified formula). We put in bold the trade-offs selected for the next section.

6.2 Attacking CSIDH

The parameters in [12] are aimed at three security levels defined by the NIST call [37]: NIST 1 should be as computationally hard as recovering the secret key of AES-128 (with quantum or classical resources), NIST 3 should be as hard as key-recovery of AES-192 and NIST 5 key-recovery of AES-256. The NIST call referred to quantum estimates of [25], but they have been improved in [33]. We plug our class group action oracle into the three quantum hidden shift algorithms of Sects. 3.23.3 and 3.4, and compute the resulting complexities (note that, in terms of quantum time, we compare only the T-gate counts). The results are summarized in Table 4.

The first generic hidden-shift algorithm that we presented (Sect. 3.2) uses a large amount of quantum memory (resp. \(2^{31}\), \(2^{43}\) and \(2^{56}\) qubits), as it needs to store all of its labeled qubits. Besides, as the quantum queries are very costly in the case of CSIDH, it is advantageous to reduce their count, even by increasing the classical complexity.

With the variant of Sect. 3.3, we see that the quantum query complexity decreases dramatically. If N is the cardinality of the class group (roughly \(\sqrt{p}\)), we solve \(8(\log _2 N)\) classical subset-sum instances on \(\log _2 N\) bits (one for each label produced before the final QFT, and a success probability of \(\frac{1}{8}\) in total), each of which costs \(2^{0.291 \log _2 N}\).Footnote 3 We make a total \(8 (\log _2 N)^2\) quantum oracle queries. The quantum memory used depends only on the quantum oracle implementation.

Going further, we can trade between classical and quantum cost with the algorithm of Sect. 3.4. We use 4-list merging, equal quantum query and classical memory costs (excluding polynomial factors). Hence we consider lists of size \(2^{\sqrt{2\log _2(N)/3}}\) everywhere and \(\sqrt{\log _2(N)/6}\) steps, obtaining the costs of Table 4 with respectively \(2^{18}\), \(2^{25}\) and \(2^{31}\) classical memory.

Table 4. Attack trade-offs, in \(\log _2\) scale, rounded to the first decimal. “<” in the quantum memory complexity means that the memory comes mainly from the oracle. We put in bold the most significant trade-offs that we obtained for each variant.

6.3 Going Further

All the parameter sizes proposed in [12] fall below their targeted security levels. In Table 4, we see that the best strategy to apply varies with the size of the parameter p. With the small instance CSIDH-512, it is better to reduce at most the number of quantum queries, even if it means increasing the classical time complexity. With CSIDH-1792, the variant of Sect. 3.3 with a polynomial number of quantum queries cannot be applied anymore, due to a too high classical complexity. However, the trade-off that we propose with Kuperberg’s second algorithm (Sect. 3.4) allows to attack CSIDH-1024 and CSIDH-1792 with a significant quantum advantage. In order to meet the NIST security levels, the bit-size of the parameter p needs to be increased.

For CSIDH-512, it seems unlikely to us that the query count of \(2^{19}\) may be significantly decreased; however, there is room for improvement in the quantum oracle. Currently, our oracle performs 1300 in-place isogeny computations, each of which requires 12 Montgomery ladders with 512 steps. With more precise estimations, and improving our current use of Babai’s algorithm, one might reduce the number of isogenies down to \(\sim \)240 [5]. But this would require to implement the algorithm of [23] as a quantum circuit and requires further investigation. We use currently 40 000 logical qubits; this could be reduced with more aggressive optimizations (for example, using dirty ancillas that don’t need to start in the state \(|0\rangle \)). We also notice that in general, quantum multiplication circuits are optimized in order to use few ancilla qubits, with Shor’s algorithm in mind. In the case of CSIDH, the prime p is smaller than an RSA modulus, but the number of ancillas can be higher, and different trade-offs might be used.

7 Conclusion

We performed the first non-asymptotic quantum security assessment of CSIDH, a recent and promising key-exchange primitive based on supersingular elliptic curve isogenies. We presented the main variants of quantum commutative hidden shift algorithms, which are used as a building block in attacking CSIDH. There are many tradeoffs in quantum hidden shift algorithms. This makes the security analysis of CSIDH all the more challenging, and we tried to be as exhaustive as possible regarding the current literature.

We gave tradeoffs, estimates and experimental simulations of their complexities. Next, we gave a quantum procedure for the class group action oracle in CSIDH, completing and extending the previous literature. Consequently, we were able to propose the first non-asymptotic cost estimates of attacking CSIDH.

Comparing these to the targeted security levels, as defined in the ongoing NIST call, we showed that the parameters proposed [12] did not meet these levels. We used different trade-offs between classical and quantum computations depending on the parameters targeted. In particular, the CSIDH-512 proposal is at least \(1\,000\) times easier to break quantumly than AES-128, using a variant polynomial in quantum queries and exponential in classical computations.

Safe Instances. The minimal size for which the attacks presented here are out of reach is highly dependent both on the way we estimate the costs (as they are subexponential) and the interpretation of the NIST metrics. In particular, does NIST 1 allows for a classical part with Time = Memory = \(2^{128}\), or only Time \(\times \) Memory = \(2^{128}\)? Moreover, the oracle cost vastly depends on the amount of qubits used inside.

We can propose two sets of parameters for security level NIST 1: one aggressive, and one conservative. If we consider that NIST 1 allows for a classical time-memory product of \(2^{128}\), \(2^{20}\) quantum queries and we neglect the polynomial factors, then the minimal size would be \(p \sim 2260\) bits, which corresponds to a multiplication by 4 of the parameter size. Our best attack would use Kuperberg’s second algorithm and 2-list merging, at a cost of \(2^{69}\) classical time, \(2^{59}\) classical memory, \(2^{20}\) quantum queries and \(2^{18}\) qubits.

For a more conservative estimation, we can consider that classical time can reach \(2^{128}\) and classical memory \(2^{64}\), that the quantum oracle for CISDH can be reduced down to \(2^{40}\) T-gates, that a quantum key recovery on AES-128 costs \(2^{80}\) T-gates (which allows for \(2^{40}\) queries and \(2^{80}\) quantum time), and neglect polynomial factors. Then this would require \(p\sim 5280\) bits, that is, multiplying by 10 the parameter size. Our best attack uses 4-list merging in Kuperberg’s second algorithm, for a cost in classical time of \(2^{128}\), \(2^{64}\) classical memory, \(2^{40}\) quantum queries, and as many qubits as required by the hypothetical improved CSIDH oracle.