Keywords

1 Introduction

Fully homomorphic encryption (FHE) allows to perform computations over encrypted data without decrypting them. This concept has long been regarded as an open problem until the breakthrough paper of Gentry in 2009 [15] which demonstrates the feasibility of computing any function on encrypted data. Since then, many constructions have appeared involving new mathematical and algorithmic concepts and improving efficiency.

In homomorphic encryption, messages are encrypted with a noise that grows at each homomorphic evaluation of an elementary operation. In a somewhat encryption scheme, the number of homomorphic operations is limited, but can be made asymptotically large using bootstrapping [15]. This technical trick introduced by Gentry allows to evaluate arbitrary circuits by essentially evaluating the decryption function on encrypted secret keys. This step has remained very costly until the recent paper of Ducas and Micciancio [11], which presented a very fast bootstrapping procedure running in around 0.69 s, making an important step towards practical FHE for arbitrary NAND circuits. In this paper, we further improve the bootstrapping procedure.

We first provide an intuitive formalization of LWE/RingLWE on numbers or polynomials over the real torus, obtained by combining the Scale-Invariant-LWE problem of [9] or the LWE normal form of [10] with the General-LWE problem of Brakerski-Gentry-Vaikutanathan [5]. We call \({\mathrm {TLWE}}\) this unified representation of LWE ciphertexts, which encode polynomials over the Torus. Its security relies either on the hardness of general or ideal lattice reduction, depending on the choice of dimensions. Using the same formalism, we extend the GSW/RingGSW ciphertexts to \({\mathrm {TGSW}}\), which is the combined analogue of Gentry-Sahai-Water’s ciphertexts from [3, 16], and which can also instantiate the ring version used in Ducas-Micciancio scheme [11] in the FHEW cryptosystem. Similarly, a \({\mathrm {TGSW}}\) ciphertext encodes an integer polynomial message, and depending on the choice of dimensions, its security is also based on (worst-case) generic or ideal lattice reduction algorithms. \({\mathrm {TLWE}}\) and \({\mathrm {TGSW}}\) are basically dual to each other, and the main idea of our efficiency result comes from the fact that these two schemes can directly be combined together to map the external product of their two messages into a \({\mathrm {TLWE}}\) sample. Since a \({\mathrm {TGSW}}\) sample is essentially a matrix whose individual rows are \({\mathrm {TLWE}}\) samples, our external product \({\mathrm {TGSW}}\) times \({\mathrm {TLWE}}\) is much quicker than the usual internal product \({\mathrm {TGSW}}\) times \({\mathrm {TGSW}}\) used in previous work. This could mostly be understood as comparing the speed of the computation of a matrix-vector product to a matrix-matrix product. As a result, we obtain a significant improvement (12 times faster) of the most efficient bootstrapping procedure [11]; it now runs in less than 0.052 s.

We also analyze the case of leveled encryption. Using an external product means that we lose some composability properties in the design of homomorphic circuits. This corresponds to circuits where boolean gates have different kinds of wires that cannot be freely interconnected. Still, we show that we maintain the expressiveness of the whole binary decision diagram and automata-based logic, which was introduced in [13] with the \({\mathrm {GSW}}\)-\({\mathrm {GSW}}\) internal product, and we tighten the analysis. Indeed, while it was impractical (10 transitions per second in the ring case, and impractical in the non-ring case), we show that the \({\mathrm {TGSW}}\)-\({\mathrm {TLWE}}\) external product enables to evaluate up to 5000 transitions per second, in a leveled homomorphic manner. We also refine the mapping between automata and homomorphic gates, and reduce the number of homomorphic operations to test a word with a deterministic automata. This allows to compile and evaluate constant-time algorithms (i.e. with data-independent control flow) in a leveled homomorphic manner, with only sub-linear noise overhead in the running time.

We also propose a new security analysis where the security parameter is directly expressed as a function of the entropy of the secret and the error rate. For the parameters that we propose in our implementation, we predict 188-bits of security for both the bootstrapping key and the keyswitching key.

Roadmap. In Sect. 2, we give mathematical definitions and a quick overview of the classical version of LWE-based schemes. In Sect. 3, we generalize LWE and GSW schemes using a torus representation of the samples. We also review the arithmetic operations over the torus and introduce our main theorem characterizing the new morphism between \({\mathrm {TLWE}}\) and \({\mathrm {TGSW}}\). As a proof of concept, we present two main applications in Sect. 4 where we explain our fast bootstrapping procedure, and in Sect. 5, we present efficient leveled evaluation of deterministic automata, and apply it on a constant-time algorithm with logarithmic memory. Finally, we provide a practical security analysis in Sect. 6.

2 Background

Notation. In the rest of the paper we will use the following notations. The security parameter will be denoted as \(\lambda \). The set \(\{0,1\}\) (without any structure) will be written \(\mathbb {B}\). The real Torus \(\mathbb {R}/\mathbb {Z}\), called \(\mathbb {T}\) set of real numbers modulo 1. \(\mathfrak {R}\) denotes the ring of polynomials \(\mathbb {Z}[X]/(X^N+1)\). \(\mathbb {T}_N[X]\) denotes \(\mathbb {R}[X]/(X^N+1)\;\mod \;1\). Finally, we note by \(\mathcal {M}_{p,q}(E)\) the set of matrices \(p\times q\) with entries in E.

This section combines some algebra theory, namely abelian groups, commutative rings, R-modules, and on some metrics of the continuous field \(\mathbb {R}\).

Definition 2.1

( R -module). Let \((R,+,\times )\) be a commutative ring. We say that a set M is a R-module when \((M,+)\) is an abelian group, and when there exists an external operation \(\cdot \) which is bi-distributive and homogeneous. Namely, \(\forall r,s\in R\) and \(x,y\in M\), \(1_R\cdot x=x\), \((r+s)\cdot x= r\cdot x+s\cdot x\), \(r\cdot (x+y)= r\cdot x+r\cdot y\), and \((r\times s)\cdot x=r\cdot (s\cdot x)\).

Any abelian group is by construction a \(\mathbb {Z}\)-module for the iteration (or exponentiation) of its own law. In this paper, one of the most important abelian group we use is the real torus \(\mathbb {T}\), composed of all reals modulo 1 (\(\mathbb {R}\;\mod \;1\)). The torus is not a ring, since the real internal product is not compatible with the modulo 1 projection (expressions like \(0\times \frac{1}{2}\) are undefined). But as an additive group, it is a \(\mathbb {Z}\)-module, and the external product \(\cdot \) from \(\mathbb {Z}\times \mathbb {T}\) to \(\mathbb {T}\), like in \(0\cdot \frac{1}{2}=0\), is well defined. More importantly, we recall that for all positive integers N and k, is a \(\mathfrak {R}\)-module.

A R-module M shares many arithmetic operations and constructions with vector spaces: vectors \(M^n\) or matrices \(\mathcal {M}_{n,m}(M)\) are also R-modules, and their left dot product with a vector in \(R^n\) or left matrix product in \(\mathcal {M}_{k,n}(R)\) are both well defined.

Gaussian Distributions. Let \(\sigma \in \mathbb {R}^{+}\) be a parameter and \(k\ge 1\) the dimension. For all \(\varvec{x}, \varvec{c}\in \mathbb {R}^k\), we note . If \(\varvec{c}\) is omitted, then it is implicitly 0. Let S be a subset of \(\mathbb {R}^k\), denotes or . For all closed (continuous or discrete) additive subgroup \(M\subseteq \mathbb {R}^k\), then is finite, and defines a (restricted) Gaussian Distribution of parameter \(\sigma \), standard deviation \(\sqrt{2/\pi }\sigma \) and center \(\varvec{c}\) over M, with the density function . Let L be a discrete subgroup of M, then the Modular Gaussian distribution over M/L exists and is defined by the density . Furthermore, when \(\text {span}(M)= \text {span}(L)\), then M/L admits a uniform distribution of constant density \(\mathcal {U}_{M/L}\). In this case, the smoothing parameter of L in M is defined as the smallest \(\sigma \in \mathbb {R}\) such that . If M is omitted, it implicitly means \(\mathbb {R}^k\).

Subgaussian Distributions. A distribution X over \(\mathbb {R}\) is \(\sigma \)-subgaussian iff it satisfies the Laplace-transformation bound: \(\forall t\in \mathbb {R}, \mathbb {E}(\exp (tX))\le \exp (\sigma ^2t^2/2)\). By Markov’s inequality, this implies that the tails of X are bounded by the Gaussian function of standard deviation \(\sigma \): \(\forall x>0, \mathbb {P}(|X|\ge x)\le 2\exp (-x^2/2\sigma ^2)\). As an example, the Gaussian distribution of standard deviation \(\sigma \) (i.e. parameter \(\sqrt{\pi /2}\sigma \)), the equi-distribution on \(\{-\sigma ,\sigma \}\), and the uniform distribution over \([-\sqrt{3}\sigma ,\sqrt{3}\sigma ]\), which all have standard deviation \(\sigma \), are \(\sigma \)-subgaussianFootnote 1. If X and \(X'\) are two independent \(\sigma \) and \(\sigma '\)-subgaussian variables, then for all \(\alpha ,\beta \in \mathbb {R}\), \(\alpha X+\beta X'\) is \(\sqrt{\alpha ^2\sigma ^2+\beta ^2\sigma '^2}\)-subgaussian.

Distance and Norms. We use the standard \(\left\| \cdot \right\| _p\) and \(\left\| \cdot \right\| _\infty \) norms for scalars and vectors over the real field or over the integers. By extension, the norm \(\left\| P(X)\right\| _p\) of a real or integer polynomial \(P\in \mathbb {R}[X]\) is the norm of its coefficient vector. If the polynomial is modulo \(X^N+1\), we take the norm of its unique representative of degree \(\le N-1\).

By abuse of notation, we write \(\left\| \varvec{x}\right\| _p= \min _{\varvec{u}\in \varvec{x}+\mathbb {Z}^k}(\left\| u\right\| _p)\) for all \(\varvec{x}\in \mathbb {T}^k\). It is the p-norm of the representative of \(\varvec{x}\) with all coefficients in \(]-\frac{1}{2},\frac{1}{2}]\). Although it satisfies the separation and the triangular inequalities, this notation is not a norm, because it lacks homogeneityFootnote 2, and \(\mathbb {T}^k\) is not a vector space either. But we have \(\forall m\in \mathbb {Z}, \left\| m\cdot \varvec{x}\right\| _p\le |m| \left\| \varvec{x}\right\| _p\). By extension, we define \(\left\| a\right\| _p\) for a polynomial \(a\in \mathbb {T}_N[X]\) as the p- norm of its unique representative in \(\mathbb {R}[X]\) of degree \(\le N-1\) and with coefficients in \(]-\frac{1}{2},\frac{1}{2}]\).

Definition 2.2

(Infinity norm over \(\mathcal {M}_{p,q}(\mathbb {T}_N[X])\) ). Let \(A \in \mathcal {M}_{p,q}(\mathbb {T}_N[X])\). We define the infinity norm of A as

$$ \left\| A\right\| _\infty = \max _{\overset{\scriptscriptstyle i\in [\![1,p]\!]}{\scriptscriptstyle j\in [\![1,q]\!]}} \left\| a_{i,j}\right\| _\infty . $$

Concentrated Distribution on the Torus, Expectation and Variance A distribution \(\mathcal {X}\) on the torus is concentrated iff. its support is included in a ball of radius \(\frac{1}{4}\) of \(\mathbb {T}\), except for negligible probability. In this case, we define the variance \(\textsf {Var}(\mathcal {X})\) and the expectation \(\mathbb {E}(\mathcal {X})\) of \(\mathcal {X}\) as respectively \(\textsf {Var}(\mathcal {X}) = \min _{\bar{x}\in \mathbb {T}} \sum p(x) |x-\bar{x}|^2\) and \(\mathbb {E}(\mathcal {X})\) as the position \(\bar{x}\in \mathbb {T}\) which minimizes this expression. By extension, we say that a distribution \(\mathcal {X}^\prime \) over \(\mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\) is concentrated iff. each coefficient has an independent concentrated distribution on the torus. Then the expectation \(\mathbb {E}(\mathcal {X}^\prime )\) is the vector of expectations of each coefficient, and \(\textsf {Var}(\mathcal {X}^\prime )\) denotes the maximum of each coefficient’s Variance.

These expectation and variance over \(\mathbb {T}\) follow the same linearity rules than their classical equivalent over the reals.

Fact 2.3

Let \(\mathcal {X}_1,\mathcal {X}_2\) be two independent concentrated distributions on either \(\mathbb {T}, \mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\), and \(e_1,e_2\in \mathbb {Z}\) such that \(\mathcal {X}= e_1\cdot \mathcal {X}_1+e_2\cdot \mathcal {X}_2\) remains concentrated, then \(\mathbb {E}(\mathcal {X})=e_1\cdot \mathbb {E}(\mathcal {X}_1)+e_2\cdot \mathbb {E}(\mathcal {X}_2)\) and \(\textsf {Var}(\mathcal {X})\le e_1^2\cdot \textsf {Var}(\mathcal {X}_1)+e_2^2\cdot \textsf {Var}(\mathcal {X}_2)\).

Also, subgaussian distributions with small enough parameters are necessarily concentrated:

Fact 2.4

Every distribution \(\mathcal {X}\) on either \(\mathbb {T}, \mathbb {T}^n\) or \(\mathbb {T}_N[X]^k\) where each coefficient is \(\sigma \)-subgaussian where \(\sigma \le 1/\sqrt{32\log (2)(\lambda +1)}\) is a concentrated distribution: a fraction \(1-2^{-\lambda }\) of its mass is in the interval \([-\frac{1}{4},\frac{1}{4}]\).

2.1 Learning with Error Problem

The Learning With Errors (\(\mathsf {LWE}\)) problem was introduced by Regev in 2005 [21]. The Ring variant, called \({\mathrm {RingLWE}}\), was introduced by Lyubashevsky, Peikert and Regev in 2010 [19]. Both variants are nowadays extensively used for the construction of lattice-based Homomorphic Encryption schemes. In the original definition [21], a \(\mathsf {LWE}\) sample has its right member on the torus and is defined using continuous Gaussian distributions. Here, we will work entirely on the real torus, employing the same formalism as the Scale Invariant \(\mathsf {LWE}\) (\(\mathsf {SILWE}\)) scheme in [9], or \(\mathsf {LWE}\) scale-invariant normal form in [10]. Without loss of generality, we refer to it as \(\mathsf {LWE}\).

Definition 2.5

((Homogeneous) LWE). Let \(n \ge 1\) be an integer, \(\alpha \in \mathbb {R}^+\) be a noise parameter and \(\varvec{s}\) be a uniformly distributed secret in some bounded set \(\mathcal {S}\in \mathbb {Z}^n\). Denote by \(\mathcal {D}^\mathsf {LWE}_{\varvec{s}, \alpha }\) the distribution over \(\mathbb {T}^n \times \mathbb {T}\) obtained by sampling a couple \((\varvec{a},b)\), where the left member \(\varvec{a} \in \mathbb {T}^n\) is chosen uniformly random and the right member \(b=\varvec{a}\cdot \varvec{s} + e\). The error e is a sample from a gaussian distribution with parameter \(\alpha \).

  • Search problem: given access to polynomially many \(\mathsf {LWE}\) samples, find \(s\in \mathcal {S}\).

  • Decision problem: distinguish between \(\mathsf {LWE}\) samples and uniformly random samples from \(\mathbb {T}^n \times \mathbb {T}\).

Both the \(\mathsf {LWE}\) search or decision problems are reducible to each other, and their average case is asymptotically as hard as worst-case lattice problems. In practice, both problems are also intractable, and their hardness increases with the the entropy of the key set \(\mathcal {S}\) (i.e. n if keys are binary) and \(\alpha \in ]0,\eta _\varepsilon (\mathbb {Z})[\).

Regev’s encryption scheme [21] is the following: Given a discrete message space \(\mathcal {M}\in \mathbb {T}\), for instance \(\{0,\frac{1}{2}\}\), a message \(\mu \in \mathcal {M}\) is encrypted by summing up the trivial \(\mathsf {LWE}\) sample \((\varvec{0}, \mu )\) of \(\mu \) to a Homogeneous \(\mathsf {LWE}\) sample \((\varvec{a},b)\in \mathbb {T}^{n+1}\) with respect to a secret key \(\varvec{s} \in \mathbb {B}^n\) and a noise parameter \(\alpha \in \mathbb {R}^+\). The semantic security of the scheme is equivalent to the \(\mathsf {LWE}\) decisional problem. The decryption of a sample \(\varvec{c}=(\varvec{a},b)\) consists in computing this quantity \(\varphi _s(\varvec{a},b)=b-\varvec{s}\cdot \varvec{a}\), which we call the phase of \(\varvec{c}\), and to round it to the nearest element in \(\mathcal {M}\). Decryption is correct with overwhelming probability \(1-2^{-p}\) provided that the parameter \(\alpha \) is \(O(R/\sqrt{p})\) where R is the packing radius of \(\mathcal {M}\).

3 Generalization

In this section we extend this presentation to rings, following the generalization of [5], and also to \({\mathrm {GSW}}\) [16].

3.1 TLWE

We first define \({\mathrm {TLWE}}\) samples, together with the search and decision problems. In the following, ciphertexts are viewed as normal samples.

Definition 3.1

(TLWE samples). Let \(k\ge 1\) be an integer, N a power of 2, and \(\alpha \ge 0\) be a noise parameter. A \({\mathrm {TLWE}}\) secret key \(\varvec{s}\in \mathbb {B}_N[X]^k\) is a vector of k polynomials \(\in \mathfrak {R}=\mathbb {Z}[X]/X^N+1\) with binary coefficients. For security purposes, we assume that private keys are uniformly chosen, and that they actually contain \(n\approx Nk\) bits of entropy. The message space of \({\mathrm {TLWE}}\) samples is \(\mathbb {T}_N[X]\). A fresh \({\mathrm {TLWE}}\) sample of a message \(\mu \in \mathbb {T}_N[X]\) with noise parameter \(\alpha \) under the key \(\varvec{s}\) is an element \((\varvec{a},b)\in \mathbb {T}_N[X]^k\times \mathbb {T}_N[X]\), \(b \in \mathbb {T}_N[X]\) has Gaussian distribution around \(\mu +\varvec{s}\cdot \varvec{a}\). The sample is random iff its left member \(\varvec{a}\) (also called mask) is uniformly random \(\in \mathbb {T}_N[X]^k\) (or a sufficiently dense submoduleFootnote 3), trivial if \(\varvec{a}\) is fixed to \(\varvec{0}\), noiseless if \(\alpha =0\), and homogeneous iff its message \(\mu \) is 0.

  • Search problem: given access to polynomially many fresh random homogeneous \({\mathrm {TLWE}}\) samples, find their key \(\varvec{s} \in \mathbb {B}_N[X]^k\).

  • Decision problem: distinguish between fresh random homogeneous \({\mathrm {TLWE}}\) samples from uniformly random samples from \(\mathbb {T}_N[X]^{k+1}\).

This definition is the analogue on the torus of the General-LWE problem of [5]. It allows to consider both LWE and RingLWE as a single problem. Choosing N large and \(k=1\) corresponds to the classical (bin)RingLWE (over cyclotomic rings, and up to a scaling factor q). When \(N=1\) and k large, then \(\mathfrak {R}\) and \(\mathbb {T}_N[X]\) respectively collapses to \(\mathbb {Z}\) and \(\mathbb {T}\), and \({\mathrm {TLWE}}\) is simply bin-LWE (up to the same scaling factor q). Other choices of Nk give some continuum between the two extremes, with a security that varies between worst-case ideal lattices to worst-case regular lattices.

Thanks to the underlying \(\mathfrak {R}\)-module structure, we can sum TLWE samples, or we can make integer linear or polynomial combinations of samples with coefficients in \(\mathfrak {R}\). However, each of these combinations increases the noise inside the samples. They are therefore limited to small coefficients.

We additionally define a function called the phase of a \({\mathrm {TLWE}}\) sample, that will be used many times. The phase computation is the first step of the classical decryption algorithm, and uses the secret key.

Definition 3.2

(Phase). Let \(\varvec{c}=(\varvec{a},b)\in \mathbb {T}_N[X]^{k}\times \mathbb {T}_N[X]\) and \(\varvec{s}\in \mathbb {B}_N[X]^k\), we define the phase of the sample as .

The phase is linear over \(\mathbb {T}_N[X]^{k+1}\) and is \((kN+1)\)-lipschitzian for the \(\ell _{\infty }\) distance: \(\forall \varvec{x},\varvec{y}\in \mathbb {T}_N[X]^{k+1}, \left\| \varphi _{\varvec{s}}(\varvec{x})-\varphi _{\varvec{s}}(\varvec{y})\right\| _\infty \le (kN+1)\left\| \varvec{x}-\varvec{y}\right\| _\infty \).

Note that a TLWE sample contains noise, that its semantic is only function of its phase, and that the phase has the nice property to be lipschitzian. Together, these properties have many interesting implications. In particular, we can always work with approximations, since two samples at a short distance on \(\mathbb {T}_N[X]^{k+1}\) share the same properties: they encode the same message, and they can in general be swapped. This fact explains why we can work and describe our algorithms on the infinite Torus.

Given a finite message space \(\mathcal {M}\subseteq \mathbb {T}_N[X]\), the (classical) decryption algorithm computes the phase \(\varphi _s(\varvec{c})\) of the sample, and returns the closest \(\mu \in \mathcal {M}\). It is easy to see that if \(\varvec{c}\) is a fresh TLWE sample of \(\mu \in \mathcal {M}\) with gaussian noise parameter \(\alpha \), the decryption of \(\varvec{c}\) over \(\mathcal {M}\) is equal to \(\mu \) as soon as \(\alpha \) is \(\varTheta (\sqrt{\lambda })\) times smaller than the packing radius of \(\mathcal {M}\). However decryption is harder to define for non-fresh samples. In this case, correctness of the decryption procedure involves a recurrence formula between the decryption of the sum and the sum of the decryption of the inputs conditioned by the noise parameters. In addition, message spaces of the input samples can be in different subgroups of \(\mathbb {T}\). To raise the limitations of the decryption function, we will instead use a mathematical definition of message and error by reasoning directly on the following \(\varOmega \)-probability space.

Definition 3.3

(The \(\varOmega \) -probability space). Since samples are either independent (random, noiseless, or trivial) fresh \(\varvec{c}\leftarrow TLWE_{\varvec{s},\alpha }(\mu )\), or linear combination \(\tilde{\varvec{c}}=\sum _{i=1}^p e_i \cdot \varvec{c_i}\) of other samples, the probability space \(\varOmega \) is the product of the probability spaces of each individual fresh samples \(\varvec{c}\) with the TLWE distributions defined in Definition 3.1, and of the probability spaces of all the coefficients \((e_1,\dots ,e_p)\in \mathfrak {R}^p\) or \(\mathbb {Z}^p\) that are obtained with randomized algorithm.

In other words, instead of viewing a TLWE sample as a fixed value which is the result of one particular event in \(\varOmega \), we will consider all the possible values at once, and make statistics on them.

We now define functions on \({\mathrm {TLWE}}\) samples: message, error, noise variance, and noise norm. These functions are well defined mathematically, and can be used in the analysis of various algorithms. However, they cannot be directly computed or approximated in practice.

Definition 3.4

Let \(\varvec{c}\) be a random variable \(\in \mathbb {T}_N[X]^{k+1}\), which we’ll interpret as a \({\mathrm {TLWE}}\) sample. All probabilities are on the \(\varOmega \)-space. We say that \(\varvec{c}\) is a valid TLWE sample iff there exists a key \(\varvec{s}\in \mathbb {B}_N[X]^k\) such that the distribution of the phase \(\varphi _{\varvec{s}}(\varvec{c})\) is concentrated. If \(\varvec{c}\) is trivial, all keys \(\varvec{s}\) are equivalent, else the mask of \(\varvec{c}\) is uniformly random, so \(\varvec{s}\) is unique. We then define:

  • the message of \(\varvec{c}\), denoted as is the expectation of \(\varphi _{\varvec{s}}(\varvec{c})\);

  • the error, denoted \(\textsf {Err}(\varvec{c})\), is equal to \(\varphi _{\varvec{s}}(\varvec{c})-\textsf {msg}(\varvec{c})\);

  • \(\textsf {Var}(\textsf {Err}(\varvec{c}))\) denotes the variance of \(\textsf {Err}(\varvec{c})\), which is by definition also equal to the variance of \(\varphi _{\varvec{s}}(\varvec{c})\);

  • finally, \(\left\| \textsf {Err}(\varvec{c})\right\| _\infty \) denotes the maximum amplitude of \(\textsf {Err}(\varvec{c})\) (possibly with overwhelming probability).

Unlike the classical decryption algorithm, the message function can be viewed as an ideal black box decryption function, which works with infinite precision even if the message space is continuous. Provided that the noise amplitude remains smaller than \(\frac{1}{4}\), the message function is perfectly linear. Using these intuitive and intrinsic functions will considerably ease the analysis of all algorithms in this paper. In particular, we have:

Fact 3.5

Given p valid and independent \({\mathrm {TLWE}}\) samples \(\varvec{c_1}, \ldots , \varvec{c_p}\) under the same key \(\varvec{s}\), and p integer polynomials \(e_1, \ldots , e_p\in \mathfrak {R}\), if the linear combination is a valid \({\mathrm {TLWE}}\) sample, it satisfies: , with variance \(\textsf {Var}(\textsf {Err}(\varvec{c})) \le \sum _{i=1}^{p} \Vert e_i\Vert _2^2 \cdot \textsf {Var}(\textsf {Err}(\varvec{c_i}))\) and noise amplitude \(\left\| \textsf {Err}(\varvec{c})\right\| _\infty \le \sum _{i=1}^{p} \left\| e_i\right\| _1 \cdot \left\| \textsf {Err}(\varvec{c_i})\right\| _\infty \). If the last bound is \(<\frac{1}{4}\), then \(\varvec{c}\) is necessarily a valid TLWE sample (under the same key \(\varvec{s})\).

In order to characterize the average case behaviour of our homomorphic operations, we shall rely on the heuristic assumption of independence below. This heuristic will only be used for practical average-case bounds. Our worst-case theorems and lemma based on the infinite norm do not use it at all.

Assumption 3.6

(Independence Heuristic). All the coefficients of the error of \({\mathrm {TLWE}}\) or \({\mathrm {TGSW}}\) samples that occur in all the linear combinations we consider are independent and concentrated. More precisely, they are \(\sigma \)-subgaussian where \(\sigma \) is the square-root of their variance.

This assumption allows us to bound the variance of the noise instead of its norm, and to provide realistic average-case bounds which often correspond to the square root of the worst-case ones. The error can easily be proved subgaussian, since each coefficients are always obtained by convolving Gaussians or zero-centered bounded uniform distributions. But the independence assumption between all the coefficients remains heuristic. Dependencies between coefficients may affect the variance of their combinations in both directions. The independence of coefficients can be obtained by adding enough entropy in all our decomposition algorithms and by increasing some parameters accordingly, but as noticed in [11], this work-around seems more as a proof artefact, and is experimentally not needed. Since average case corollaries should reflect practical results, we leave the independence of subgaussian samples as a heuristic assumption.

3.2 TGSW

In this section we present a generalized scale invariant version of the FHE scheme \({\mathrm {GSW}}\) [16], that we call \({\mathrm {TGSW}}\). \({\mathrm {GSW}}\) was proposed Gentry, Sahai and Waters in 2013 [16], and improved in [3] and its security is based on the \(\mathsf {LWE}\) problem. The scheme relies on a gadget decomposition function, which we also extend to polynomials, but most importantly, the novelty is that our function is an approximate decomposition, up to some precision parameter. This allows to improve running time and memory requirements for a small amount of additional noise.

Definition 3.7

(Approximate Gadget Decomposition). Let \(\varvec{h}\in \mathcal {M}_{p,k+1}(\mathbb {T}_N[X])\) as in (1). We say that \(Dec_{\varvec{h},\beta ,\epsilon }(\varvec{v})\) is a decomposition algorithm on the gadget \(\varvec{h}\) with quality \(\beta \) and precision \(\epsilon \) if and only if for any \({\mathrm {TLWE}}\) sample \(\varvec{v}\in \mathbb {T}_N[X]^{k+1}\), it efficiently and publicly outputs a small vector \(\varvec{u}\in \mathfrak {R}^{(k+1)\ell }\) such that \(\left\| \varvec{u}\right\| _\infty \le \beta \) and \(\left\| \varvec{u}\cdot \varvec{h}-\varvec{v}\right\| _\infty \le \epsilon \). Furthermore, the expectation of \(\varvec{u}\cdot \varvec{h}-\varvec{v}\) must to be 0 when \(\varvec{v}\) is uniformly distributed in \(\mathbb {T}_N[X]^{k+1}\)

Definition 3.7 is generic, but in the rest of the paper, we will only use this fixed gadget:

(1)

The matrix \(\varvec{h}\) consists in a diagonal of columns, each containing a superincreasing sequence of constant polynomials in \(\mathbb {T}\). Algorithm 1 represents an efficient decomposition of \({\mathrm {TLWE}}\) samples on \(\varvec{h}\), and the following lemma proves its correctness. In theory, decomposition algorithms should be randomized to guarantee that the distribution of all error coefficients remain independent. In practice, we already rely on Heuristic 3.6. We just need that the expectation of the small errors induced by the approximations remains null, so that the message is not changed.

Lemma 3.8

Let \(\ell \in \mathbb {N}\) and \(B_g\in \mathbb {N}\). Then for \(\beta =B_g/2\) and \(\epsilon =1/2B_g^\ell \), Algorithm 1 is a valid \(Dec_{\varvec{h},\beta ,\epsilon }\).

figure a

Proof

Let \(\varvec{v} = (a,b) = (a_1, \ldots , a_{k}, b=a_{k+1}) \in \mathbb {T}_N[X]^{k+1}\) be a \({\mathrm {TLWE}}\) sample, given as input to Algorithm 1. Let \(\varvec{u} = [e_{1,1},\dots ,e_{k+1,\ell }]\in \mathfrak {R}^{(k+1)\ell }\) be the corresponding output by construction \(\left\| \varvec{u}\right\| _\infty \le B_g/2 = \beta \).

Let \(\varvec{\epsilon _\mathbf{dec }}=\varvec{u}\cdot \varvec{h}-\varvec{v}\). For all \(i\in [\![1,k+1 ]\!]\) and \(j\in [\![1,\ell ]\!]\), we have by construction . Since \(\bar{a}_{i,j}\) is defined as the nearest multiple of \(\frac{1}{B_g^\ell }\) on the torus, we have \(|\bar{a}_{i,j}-a_{i,j}|\le 1/2B_g^\ell =\epsilon \). \(\varvec{\epsilon _\mathbf{dec }}\) has therefore a concentrated distribution when \(\varvec{v}\) is uniform. We now verify that it is zero-centered. Finally, if we call f the function from \(\mathbb {T}\) to \(\mathbb {T}\) which rounds an element x to its closest multiple of \(\frac{1}{B_g^\ell }\) and the function g the symmetry defined by \(g(x)=2f(x)-x\) on the torus; we easily verify that the \(\mathbb {E}(\varvec{\epsilon _\mathbf{dec }}_{i,j})\) is equal to \(\mathbb {E}(a_{i,j}-f(a_{i,j}))\) when \(a_{i,j}\) has uniform distribution, which is equal to \(\mathbb {E}(g(a_{i,j})-f(g(a_{i,j})))\) when \(g(a_{i,j})\) has uniform distribution also equal to \(\mathbb {E}(f(a_{i,j})-a_{i,j})=-\mathbb {E}(\varvec{\epsilon _\mathbf{dec }}_{i,j})\). Thus, the expectation of \(\varvec{\epsilon _\mathbf{dec }}\) is 0.    \(\square \)

We are now ready to define \({\mathrm {TGSW}}\) samples, and to extend the notions of phase of valid sample, message and error of the samples.

Definition 3.9

(TGSW samples). Let \(\ell \) and \(k\ge 1\) be two integers, \(\alpha \ge 0\) be a noise parameter and \(\varvec{h}\) the gadget defined in Eq. (1). Let \(\varvec{s}\in \mathbb {B}_N[X]^k\) be a \({\mathrm {RingLWE}}\) key, we say that \(\varvec{C}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is a fresh \({\mathrm {TGSW}}\) sample of \(\mu \in \mathfrak {R}/\varvec{h}^\perp \) with noise parameter \(\alpha \) iff where each row of \(\varvec{Z}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is an Homogeneous TLWE sample (of 0) with Gaussian noise parameter \(\alpha \). Reciprocally, we say that an element \(\varvec{C}\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) is a valid TGSW sample iff there exists a unique polynomial \(\mu \in \mathfrak {R}/\varvec{h}^\perp \) and a unique key \(\varvec{s}\) such that each row of is a valid TLWE sample of 0 for the key \(\varvec{s}\). We call the polynomial \(\mu \) the message of \(\varvec{C}\), and we denote it by .

Definition 3.10

(Phase, Error). Let \(A=\in \mathcal {M}_{(k+1)\ell ,k+1}(\mathbb {T}_N[X])\) be a \({\mathrm {TGSW}}\) sample for a secret key \(\varvec{s}\in \mathbb {B}_N[X]^k\) and noise parameter \(\alpha \ge 0\).

We define the phase of A, denoted as \(\varphi _{\varvec{s}}(A)\in (\mathbb {T}_N[X])^{(k+1)\ell }\), as the list of the \((k+1)\ell \) \({\mathrm {TLWE}}\) phases of each line of A. In the same way, we define the error of A, denoted , as the list of the \((k+1)\ell \) \({\mathrm {TLWE}}\) errors of each line of A.

Since \({\mathrm {TGSW}}\) samples are essentially vectors of \({\mathrm {TLWE}}\) samples, they are naturally compatible with linear operations. And both phase and message functions remain linear.

Fact 3.11

Given p valid \({\mathrm {TGSW}}\) samples \(C_1, \ldots , C_p\) of messages \(\mu _1, \ldots , \mu _p\) under the same key, and with independent error coefficients, and given p integer polynomials \(e_1, \ldots , e_p\), the linear combination is a sample of \(\mu = \sum _{i=1}^{p} e_i \cdot \mu _i\), with variance \(\textsf {Var}(C) = \left( \sum _{i=1}^{p} \Vert e_i\Vert _2^2 \cdot \textsf {Var}(C_i) \right) ^{1/2}\) and noise infinity norm \(\left\| \textsf {Err}(C)\right\| _\infty = \sum _{i=1}^{p} \left\| e_i\right\| _1 \cdot \left\| \textsf {Err}(C)\right\| _\infty \).

Also, the phase remains \(1+kN\) lipschitzian for the infinity norm.

Fact 3.12

For all \(A\in \mathcal {M}_{p,k+1}(\mathbb {T}_N[X])\), \(\left\| \varphi _{\varvec{s}}(A)\right\| _\infty \le (Nk+1)\left\| A\right\| _\infty \).

We finally define the homomorphic product between \({\mathrm {TGSW}}\) and \({\mathrm {TLWE}}\) samples, whose corresponding message is simply the product of the two messages of the initial samples. Since the left member encodes an integer polynomial, and the right one a torus polynomial, this operator performs a homomorphic evaluation of their external product. Theorem 3.14 (resp. Corollary 3.15) analyzes the worst-case (resp. average-case) noise propagation of this product. Then, Corollary 3.16 relates this new morphism to the classical internal product between \({\mathrm {TGSW}}\) samples.

Definition 3.13

(External product). We define the product \(\boxdot \) as

$$ \begin{aligned} \boxdot :{\mathrm {TGSW}}\times {\mathrm {TLWE}}&\longrightarrow {\mathrm {TLWE}}\\ (A,\varvec{b})&\longmapsto A\boxdot \varvec{b} = Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b})\cdot A. \end{aligned} $$

The formula is almost identical to the classical product defined in the original GSW scheme in [16], except that only one vector needs to be decomposed. For this reason, we get almost the same noise propagation formula, with an additional term that comes from the approximations in the decomposition.

Theorem 3.14

(Worst-case External Product). Let A be a valid \({\mathrm {TGSW}}\) sample of message \(\mu _A\) and let \(\varvec{b}\) be a valid \({\mathrm {TLWE}}\) sample of message \(\mu _{\varvec{b}}\). Then \(A \boxdot \varvec{b}\) is a \({\mathrm {TLWE}}\) sample of message \(\mu _A \cdot \mu _{\varvec{b}}\) and \(\left\| \textsf {Err}(A\boxdot \varvec{b})\right\| _\infty \le (k+1)\ell N\beta \left\| \textsf {Err}(A)\right\| _\infty + \left\| \mu _A\right\| _1(1+kN)\epsilon + \left\| \mu _A\right\| _1\left\| \textsf {Err}(\varvec{b})\right\| _\infty \) (worst case), where \(\beta \) and \(\epsilon \) are the parameters used in the decomposition \(Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b})\). If \(\left\| \textsf {Err}(A\boxdot \varvec{b})\right\| _\infty \le 1/4\) we are guaranteed that \(A \boxdot \varvec{b}\) is a valid \({\mathrm {TLWE}}\) sample.

Proof

As \(A={\mathrm {TGSW}}(\mu _A)\), then by definition it is equal to \(A = Z_A + \mu _A \cdot \varvec{h}\), where \(Z_A\) is a \({\mathrm {TGSW}}\) encryption of 0 and \(\varvec{h}\) is the gadget matrix. In the same way, as \(\varvec{b}={\mathrm {TLWE}}(\mu _{\varvec{b}})\), then by definition it is equal to \(\varvec{b} = \varvec{z_{\varvec{b}}} + (\varvec{0},\mu _{\varvec{b}})\), where \(\varvec{z_{\varvec{b}}}\) is a \({\mathrm {TLWE}}\) encryption of 0. Let

$$ {\left\{ \begin{array}{ll} \left\| \textsf {Err}(A)\right\| _\infty = \left\| \varphi _{\varvec{s}}(Z_A)\right\| _\infty = \eta _A \\ \left\| \textsf {Err}(\varvec{b})\right\| _\infty = \left\| \varphi _{\varvec{s}}(\varvec{z_{\varvec{b}}})\right\| _\infty = \eta _{\varvec{b}}. \end{array}\right. } $$

Let \(\varvec{u} = Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b}) \in \mathfrak {R}^{(k+1)\ell }\). By definition \(A \boxdot \varvec{b}\) is equal to

$$ \begin{aligned} A \boxdot \varvec{b}&= \varvec{u} \cdot A \\&= \varvec{u} \cdot Z_A + \mu _A \cdot (\varvec{u} \cdot \varvec{h}). \end{aligned} $$

From Definition 3.7, we have that \(\varvec{u}\cdot \varvec{h} = \varvec{b} + \varvec{\epsilon _{dec}}\), where \(\left\| \varvec{\epsilon _{dec}}\right\| _\infty = \left\| \varvec{u}\cdot \varvec{h}-\varvec{b}\right\| _\infty \le \epsilon \). So

$$ \begin{aligned} A \boxdot \varvec{b}&= \varvec{u} \cdot Z_A + \mu _A \cdot (\varvec{b} + \varvec{\epsilon _{dec}}) \\&= \varvec{u} \cdot Z_A + \mu _A \cdot \varvec{\epsilon _{dec}} + \mu _A \cdot \varvec{z_{\varvec{b}}} + (\varvec{0},\mu _A\cdot \mu _{\varvec{b}}). \end{aligned} $$

Then the phase (linear function) of \(A \boxdot \varvec{b}\) is

$$ \varphi _{\varvec{s}}(A \boxdot \varvec{b}) = \varvec{u} \cdot \textsf {Err}(A) + \mu _A \cdot \varphi _{\varvec{s}}(\varvec{\epsilon _{dec}}) + \mu _A \cdot \textsf {Err}(\varvec{b}) + \mu _A\mu _{\varvec{b}}. $$

Taking the expectation, we get that \(\textsf {msg}(A \boxdot \varvec{b}) = 0+0+0+\mu _A\mu _{\varvec{b}}\), and so \(\textsf {Err}(A \boxdot \varvec{b}) = \varphi _{\varvec{s}}(A \boxdot \varvec{b}) - \mu _A\mu _{\varvec{b}}\). Then thanks to Fact 3.12, we have

$$ \begin{aligned} \left\| \textsf {Err}(A\boxdot \varvec{b})\right\| _\infty&\le \left\| \varvec{u} \cdot \textsf {Err}(A)\right\| _\infty + \left\| \mu _A \cdot \varphi (\varvec{\epsilon _{dec}})\right\| _\infty + \left\| \mu _A \cdot \textsf {Err}(\varvec{b})\right\| _\infty \\&\le (k+1)\ell N\beta \eta _A + \left\| \mu _A\right\| _1 (1+kN)\left\| \varvec{\epsilon _{dec}}\right\| _\infty + \left\| \mu _A\right\| _1 \eta _{\varvec{b}}. \end{aligned} $$

The result follows.    \(\square \)

We similarly obtain the more realistic average-case noise propagation, based on the independence heuristic, by bounding the Gaussian variance instead of the amplitude.

Corollary 3.15

(Average-case External Product). Under the same conditions of Theorem 3.14 and by assuming the Heuristic 3.6, we have that \(\textsf {Var}(\textsf {Err}(A\boxdot \varvec{b})) \le (k+1)\ell N\beta ^2\textsf {Var}(\textsf {Err}(A)) + (1+kN)\left\| \mu _A\right\| _2^2 \epsilon ^2 + \left\| \mu _A\right\| _2^2 \textsf {Var}(\textsf {Err}(\varvec{b}))\).

Proof

Let \(\vartheta _A=\textsf {Var}(\textsf {Err}(A)) = \textsf {Var}(\varphi _{\varvec{s}}(Z_A))\) and \(\vartheta _{\varvec{b}}=\textsf {Var}(\textsf {Err}(\varvec{b})) = \textsf {Var}(\varphi _{\varvec{s}}(\varvec{z_{\varvec{b}}}))\). By using the same notations as in the proof of Theorem 3.14 we have that the error of \(A \boxdot \varvec{b}\) is \( \textsf {Err}(A \boxdot \varvec{b}) = \varvec{u} \cdot \textsf {Err}(A) + \mu _A \cdot \varphi _{\varvec{s}}(\varvec{\epsilon _{dec}}) + \mu _A \cdot \textsf {Err}(\varvec{b}) \) and thanks to Assumption 3.6 and Fact 3.12, we have:

$$ \begin{aligned} \textsf {Var}(\textsf {Err}(A\boxdot \varvec{b}))&\le \textsf {Var}(\varvec{u} \cdot \textsf {Err}(A))) + \textsf {Var}(\mu _A \cdot \varphi (\varvec{\epsilon _{dec}})) + \textsf {Var}(\mu _A \cdot \textsf {Err}(\varvec{b})) \\&\le (k+1)\ell N\beta ^2\vartheta _A + (1+kN)\left\| \mu _A\right\| _2^2 \epsilon ^2 + \left\| \mu _A\right\| _2^2 \vartheta _{\varvec{b}}. \end{aligned} $$

   \(\square \)

The last corollary describes exactly the classical internal product between two \({\mathrm {TGSW}}\) samples, already presented in [3, 11, 13, 16] with adapted notations. As we mentioned before, it is much slower to evaluate, because it consists in \((k+1)\ell \) independent computations of the \(\boxdot \) product, which we illustrate now.

Corollary 3.16

(Internal Product). Let the product

$$ \begin{aligned} \boxtimes :{\mathrm {TGSW}}\times {\mathrm {TGSW}}&\longrightarrow {\mathrm {TGSW}}\\ (A,B)&\longmapsto A\boxtimes B = \left[ \begin{array}{c} A \boxdot \varvec{b_1} \\ \vdots \\ A \boxdot \varvec{b_{(k+1)\ell }} \end{array} \right] = \left[ \begin{array}{c} Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b_1})\cdot A \\ \vdots \\ Dec_{\varvec{h},\beta ,\epsilon }(\varvec{b_{(k+1)\ell }})\cdot A \end{array} \right] , \end{aligned} $$

with A and B two valid \({\mathrm {TGSW}}\) samples of messages \(\mu _A\) and \(\mu _B\) respectively and \(\varvec{b_i}\) corresponding to the i-th line of B. Then \(A \boxtimes B\) is a \({\mathrm {TGSW}}\) sample of message \(\mu _A \cdot \mu _B\) and (worst case). If we are guaranteed that \(A \boxdot B\) is a valid \({\mathrm {TGSW}}\) sample.

Furthermore, by assuming the Heuristic 3.6, we have that (average case).

Proof

Let A and B be two \({\mathrm {TGSW}}\) samples, and \(\mu _A\) and \(\mu _B\) their message. By definition, the i-th row of B encodes , so the i-th row of \(A\boxtimes B\) encodes . This proves that \(A\boxtimes B\) encodes \(\mu _A\mu _B\). Since the internal product \(A \boxtimes B\) consists in \((k+1)\ell \) independent runs of the external products \(A\boxdot \varvec{b_i}\), the noise propagation formula directly follows from Theorem 3.14 and Corollary 3.15.    \(\square \)

In the next section, we show that all internal products in the bootstrapping procedure can be replaced with the external one. Consequently, we expect a speed-up of a factor at least \((k+1)\ell \).

4 Application: Single Gate Bootstrapping in Less Than 0.1 Seconds

In this section, we show how to use Theorem 3.14 to speed-up the bootstrapping presented in [11]. With additional optimizations, we drastically reduce the bootstrapping key size, and also reduce a bit the noise overhead. To bootstrap a LWE sample \((a,b)\in \mathbb {T}^{n+1}\), which is rescaled as \((\bar{\varvec{a}},\bar{\varvec{b}})\mod 2N\), using relevant encryptions of its secret key \(\varvec{s}\in \mathbb {B}^n\), the overall idea is the following. We start from a fixed polynomial \(\text {testv}\in \mathbb {T}_N[X]\), which is our phase detector: its i-th coefficient is set to the value that the bootstrapping should return if \(\varphi _{\varvec{s}}(a,b)=i/2N\). \(\text {testv}\) is first encoded in a trivial \(\mathsf {LWE}\) sample. Then, we iteratively rotate its coefficients, using external multiplications with \({\mathrm {TGSW}}\) encryptions of the hidden monomials \(X^{-s_i\bar{a_i}}\). By doing so, the original \(\text {testv}\) gets rotated by the (hidden) phase of \((\varvec{a},b)\), and in the end, we simply extract the constant term as a \(\mathsf {LWE}\) sample.

4.1 TLWE to LWE Extraction

Like in previous work, extracting a LWE sample from a TLWE sample simply means rewriting polynomials into their list of coefficients, and discarding the \(N-1\) last coefficients of b. This yields a LWE encryption of the constant term of the initial polynomial message.

Definition 4.1

(TLWE Extraction). Let \((\varvec{a^{\prime \prime }},b^{\prime \prime })\) be a \({\mathrm {TLWE}}_{\varvec{s^{\prime \prime }}}(\mu )\) sample with key \(\varvec{s^{\prime \prime }}\in \mathfrak {R}^k\), We call the integer vector \(\varvec{s'}=\left( \mathsf {coefs}(s_1^{\prime \prime }(X),\dots ,\mathsf {coefs}(s_k^{\prime \prime }(X)\right) \in \mathbb {Z}^{kN}\) and the \(\mathsf {LWE}\) sample \((\varvec{a'},b')\in \mathbb {T}^{kN+1}\) where \(\varvec{a'}=\left( \mathsf {coefs}(a_1^{\prime \prime }(1/X),\dots ,\mathsf {coefs}(a_k^{\prime \prime }(1/X)\right) \) and \(b'=b^{\prime \prime }_0\) the constant term of \(b^{\prime \prime }\). Then \(\varphi _{\varvec{s'}}(a',b')\) (resp. ) is equal to the constant term of \(\varphi _{\varvec{s^{\prime \prime }}}(a^{\prime \prime },b^{\prime \prime })\) (resp. to the constant term of ). And and .

4.2 LWE to LWE Key-Switching Procedure

Given a \(\mathsf {LWE}_{\varvec{s'}}\) sample of a message \(\mu \in \mathbb {T}\), the key switching procedure initially proposed in [5, 7] outputs a \(\mathsf {LWE}_{\varvec{s}}\) sample of the same \(\mu \) without increasing the noise too much. Contrary to previous exact keyswitch procedures, here we tolerate approximations.

Definition 4.2

Let \(\varvec{s}^\prime \in \{0,1\}^{n^\prime }\), \(\varvec{s}\in \{0,1\}^{n}\), a noise parameter \(\gamma \in \mathbb {R}\) and a precision parameter \(t\in \mathbb {N}\), we call key switching secret \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\) a sequence of fresh \(\mathsf {LWE}\) samples \(\mathsf {KS}_{i,j}\in \mathsf {LWE}_{\varvec{s},\gamma }(s_i'\cdot 2^{-j})\) for \(i\in [1,n']\) and \(j\in [1,t]\).

Lemma 4.3

(Key switching). Given \((\varvec{a'},b')\in \mathsf {LWE}_{\varvec{s}'}(\mu )\) where \(\varvec{s}'\in \{0,1\}^{n'}\) with noise \(\eta '=\left\| \textsf {Err}(\varvec{a'},b')\right\| _\infty \) and a keyswitching key \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\), where \(\varvec{s}\in \{0,1\}^n\), the key switching procedure outputs a \(\mathsf {LWE}\) sample \((\varvec{a},b)\in \mathsf {LWE}_{\varvec{s}_n}(\mu )\) where .

Proof

We have

$$\begin{aligned} \varphi _{\varvec{s}}(\varvec{a},b)&=\varphi _{\varvec{s}}(\varvec{0},b')-\sum _{i=1}^{n'} \sum _{j=1}^t a_{i,j}\varphi _{\varvec{s}}(\mathsf {KS}_{i,j})\\[-0.7em]&= b'-\sum _{i=1}^{n'}\sum _{j=1}^t a_{i,j}\Big (2^{-j}s_i' +\textsf {Err}(\mathsf {KS}_{i,j})\Big )\\[-0.7em]&= b'-\sum _{i=1}^{n'} \bar{a}_i's_i'- \sum _{i=1}^{n'}\sum _{j=1}^t a_{i,j} \textsf {Err}(\mathsf {KS}_{i,j})\\[-0.7em]&= b'-\sum _{i=1}^{n'} a_i's_i'-\sum _{i=1}^{n'}\sum _{j=1}^t a_{i,j}\textsf {Err}(\mathsf {KS}_{i,j})+ \sum _{i=1}^{n'} (a_i'-\bar{a}_i') s_i'\\[-0.7em]&= \varphi _{\varvec{s}'}(\varvec{a}',b')-\sum _{i=1}^{n'}\sum _{j=1}^t a_{i,j}\textsf {Err}(\mathsf {KS}_{i,j})+ \sum _{i=1}^{n'} (a_i'-\bar{a}_i') s_i'. \end{aligned}$$

The expectation of the left side of the equality is equal to \(\textsf {msg}(\varvec{a},b)\). For the right side, each \(a_{i,j}\) is uniformly distributed in \(\{0,1\}\) and \((a_i'-\bar{a}_i')\) is a 0-centered variable so the expectation of the sum is 0. Thus, \(\textsf {msg}(\varvec{a},b)=\textsf {msg}(\varvec{a'},b')\). We obtain \(\left\| \varphi _{\varvec{s}}(\varvec{a},b)-\textsf {msg}(\varvec{a},b)\right\| _\infty \le {\eta '}+n'\cdot t \cdot \gamma +n'2^{-(t+1)}\).    \(\square \)

figure b

Corollary 4.4

Let t be an integer parameter. Under Assumption 3.6 Given \((\varvec{a'},b')\in \mathsf {LWE}_{\varvec{s'}}(\mu )\) with noise variance and a key switching key \(\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,\ell }\), the key switching procedure outputs an LWE sample \((\varvec{a}',b')\in \mathsf {LWE}_{\varvec{s}}(\mu )\) where .

4.3 Bootstrapping Procedure

Given a \(\mathsf {LWE}\) sample \(\mathsf {LWE}_{\varvec{s}}(\mu )=(\varvec{a},b)\), the bootstrapping procedure constructs an encryption of \(\mu \) under the same key \(\varvec{s}\) but with a fixed amount of noise. As in [11], we will use \({\mathrm {TLWE}}\) as an intermediate encryption scheme to perform a homomorphic evaluation of the phase but here we will use its external product from Theorem 3.14 with a \({\mathrm {TGSW}}\) encryption of the key \(\varvec{s}\).

Definition 4.5

Let \(\varvec{s}\in \mathbb {B}^n\), \(\varvec{s^{\prime \prime }}\in \mathbb {B}_N[X]^k\) and \(\alpha \) be a noise parameter. We define the bootstrapping key \(\text {BK}_{\varvec{s}\rightarrow \varvec{s^{\prime \prime }},\alpha }\) as the sequence of n \({\mathrm {TGSW}}\) samples where \(\text {BK}_i\in {\mathrm {TGSW}}_{\varvec{s^{\prime \prime }},\alpha }(s_i)\).

figure c

We first provide a comparison between the bootstrapping of Algorithm 3 and [11, Algorithms 1 and 2] proposal.

  • Like [11], we rescale the computation of the phase of the input \(\mathsf {LWE}\) sample so that it is modulo 2N (line 2) and we map all the corresponding operations in the multiplicative cyclic group \(\{ 1,X,\dots ,X^{2N-1} \}\). Since our \(\mathsf {LWE}\) samples are described over the real torus, the rescaling is done explicitly in line 2. This rescaling may induce a cumulated rounding error of amplitude at most \(\delta \approx \sqrt{n}/4N\) in the average case and \(\delta \le (n+1)/4N\) in the worst case. In the best case, this amplitude can decrease to zero (\(\delta =0\)) if in the actual representation of \(\mathsf {LWE}\) samples, all the coefficients are restricted to multiple of \(\frac{1}{2N}\), which would be the analogue of [11]’s setting.

  • As in [11], messages are encoded as roots of unity in \(\mathcal {R}\). Our accumulator is a \({\mathrm {TLWE}}\) sample instead of a \({\mathrm {TGSW}}\) sample in [11]. Also accumulator operations use the external product from Theorem 3.14 instead of the slower classical internal product. The test vector \((1\text {+}X\text {+}\dots \text {+}X^{N-1})\) is embedded in the accumulator from the very start, when the accumulator is still noiseless while in [11], it is added at the very end. This removes a factor \(\sqrt{N}\) to the final noise overhead.

  • All the \({\mathrm {TGSW}}\) ciphertexts of \(X^{-\bar{a}_i s_i}\) required to update the accumulator internal value are computed dynamically as a very small polynomial combination of \(BK_i\) in the for loop (line 5). This completely removes the need to decompose each \(\bar{a}_i\) on an additional base \(B_r\), and to precompute all possibilities in the bootstrapping key. In other words, this makes our bootstrapping key 46 times smaller than in [11], for the exact same noise overhead. Besides, due to this squashing technique, two accumulator operations were performed per iteration instead of one in our case. This gives us an additional 2X speed up.

Theorem 4.6

(Bootstrapping Theorem). Let \(\varvec{h}\in \mathcal {M}_{\ell (k+1),k+1}(\mathbb {T}_N[X])\) be the gadget defined in Eq. 1 and let \(Dec_{\varvec{h},\epsilon ,\beta }\) be the associated vector gadget decomposition function.

Let \(\varvec{s}\in \mathbb {B}^n\), \(\varvec{s^{\prime \prime }}\in \mathbb {B}_N[X]^k\) and \(\alpha ,\gamma \) be noise amplitudes. Let be a bootstrapping key, let and \(\mathsf {KS}=\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\) be a keyswitching secret.

Given \((\varvec{a},b)\in \mathsf {LWE}_{\varvec{s}}(\mu )\) for \(\mu \in \mathbb {T}\), two fixed messages \(\mu _0,\mu _1\), Algorithm 3 outputs a sample in \(\mathsf {LWE}_{\varvec{s}}(\mu ')\) s.t. \(\mu '=\mu _0\) if \(|\varphi _{\varvec{s}}(\varvec{a},b)|<-1/4-\delta \) and \(\mu '=\mu _1\) if \(|\varphi _{\varvec{s}}(\varvec{a},b)|> 1/4+\delta \) where \(\delta \) is the cumulated rounding error equal to \(\frac{n+1}{4N}\) in the worst case and \(\delta =0\) if the all coefficients of \((\varvec{a},b)\) are multiple of \(\frac{1}{2N}\). Let \(\varvec{v}\) be the output of Algorithm 3. Then .

Proof

Line 1: the division by two over torus gives two possible values for \((\bar{\mu },\bar{\mu }')\). In both cases, \(\bar{\mu }+\bar{\mu }'=\mu _0\) and \(\bar{\mu }-\bar{\mu }'=\mu _1\).

Line 2: let \(\bar{\varphi } \mathop {=}\limits ^{\tiny {def}}\bar{b} -\sum _{i=1}^n {\bar{a}}_i s_i \mod ~2N\). We have

$$\begin{aligned} \Big |\varphi -\frac{\bar{\varphi }}{2N}\Big |= b-\frac{\lfloor 2Nb\rceil }{2N}+\sum _{i=1}^n\Big (a_i-\frac{\lfloor 2Na_i \rceil }{2N}\Big )s_i \le \frac{1}{4N}+\sum _{i=1}^n \frac{1}{4N}\le \frac{n+1}{4N}.~ \end{aligned}$$
(2)

And if the coefficients \((\varvec{a},b)\in \frac{1}{2N}\mathbb {Z}/\mathbb {Z}\), then \(\varphi =\frac{\bar{\varphi }}{2N}\). In all cases, \(|\varphi -\frac{\bar{\varphi }}{2N}|<\delta \).

At line 3, the test vector is defined such that for all \(p\in [0,2N]\), the constant term of is either \(\bar{\mu }'\) if \(p\in ]\!]-\frac{N}{2},\frac{N}{2}[\![\) and \(-\bar{\mu }'\) else.

In the loop for (from line 5 to 6), we will prove the following invariant: At the beginning of iteration \(i+1\in [1,n+1]\) (i.e. at the end of iteration i), and \(\left\| \textsf {Err}({ACC}_i)\right\| _\infty \le \sum _{j=1}^i \Big ( 2(k+1)\ell N \beta \left\| \textsf {Err}(\text {BK}_j)\right\| _\infty +(1+kN)\epsilon \Big )\).

At the beginning of iteration \(i=1\), the accumulator contains a trivial ciphertext , so \(\left\| \textsf {Err}({ACC}_1)\right\| _\infty =0\).

During iteration i, is a TGSW sample of message \(X^{-\bar{a}_i s_i}\) (this can be seen by replacing \(s_i\) with its two possible values 0 and 1) and of noise \(\left\| \textsf {Err}(A_i)\right\| _\infty \le 2 \left\| \textsf {Err}(\text {BK}_i)\right\| _\infty \). This inequality holds from Fact 3.11. Then, we have:

and from the norm inequality of Theorem 3.14,

$$\begin{aligned} \left\| \textsf {Err}({ACC}_i)\right\| _\infty&\le (k+1)\ell N\beta \left\| \textsf {Err}(A_i)\right\| _\infty + \left\| \textsf {msg}(A_i)\right\| _1(1+kN)\epsilon + \\& +\left\| \textsf {msg}(A_i)\right\| _1 \left\| \textsf {Err}({ACC}_{i-1})\right\| _\infty \\&\le (k+1)\ell N\beta 2 \left\| \textsf {Err}(\text {BK}_i)\right\| _\infty + (1+kN)\epsilon + \left\| \textsf {Err}({ACC}_{i-1})\right\| _\infty . \end{aligned}$$

This proves the invariant by induction on i.

After \(\textsf {SampleExtract}\) (line 7), the message of u is equal to the constant term of the message of \({ACC}_n\), i.e. where \(\bar{\varphi }=\bar{b}-\sum _{i=1}^n \bar{a}_i s_i\). If \(\bar{\varphi }\in [\![-N/2,N/2[\![\), the constant term is equal to \(\bar{\mu }'\) and \(-\bar{\mu }'\) otherwise.

In other words, \(|\varphi _{\varvec{s}}(\varvec{a},b)|< 1/4-\delta \), then \(\varphi _{\varvec{s}}(\varvec{a},b) < 1/4-\delta \) and \(\varphi _{\varvec{s}}(\varvec{a},b) \ge -1/4+ \delta \) and thus using Eq. (2), we obtain that \(\bar{\varphi }\in ]\!]-\frac{N}{2},\frac{N}{2}[\![\) and thus, the message of u is equal to \(\bar{\mu }'\). And if \(|\varphi _{\varvec{s}}(\varvec{a},b)|> 1/4+\delta \) then \(\varphi _{\varvec{s}}(\varvec{a},b)>1/4+\delta \) or \(\varphi _{\varvec{s}}(\varvec{a},b)<-1/4-\delta \) and using Eq. (2), we obtain the message of u is equal to \(-\bar{\mu }'\).

Since \(\textsf {SampleExtract}\) does not add extra noise, \(\left\| \textsf {Err}(\varvec{u})\right\| _\infty \le \left\| \textsf {Err}({ACC}_n)\right\| \). Since the KeySwitch procedure preserves the message, the message of \(v=\mathsf {KeySwitch}_{\mathsf {KS}}(\varvec{u})\) is equal to the message of u. And \(\left\| \textsf {Err}(\varvec{v})\right\| _\infty \le \left\| \textsf {Err}(\varvec{u})\right\| _\infty +kNt\gamma +kN2^{-(t+1)}\).    \(\square \)

Corollary 4.7

Let and \(V_{\mathsf {KS}}=\textsf {Var}(\textsf {Err}(\mathsf {{KS}_i}))=2/\pi \cdot \gamma ^2\). Under the same conditions of Theorem 4.6, and assuming Assumption 3.6, then the Variance of the output v of Algorithm 3 satisfies .

Proof

The proof is the same as for the proof of the bound on \(\left\| \textsf {Err}(\varvec{v})\right\| _\infty \) replacing all \(\left\| \right\| _\infty \) inequalities by \(\textsf {Var}()\) inequalities.    \(\square \)

4.4 Application to Circuits

In [11], the homomorphic evaluation of a NAND gate between \(\mathsf {LWE}\) samples is achieved with 2 additions (one with a noiseless trivial sample) and a bootstrapping. Let \(\text {BK}=\text {BK}_{\varvec{s}\rightarrow \varvec{s}^{\prime \prime },\alpha }\) be a bootstrapping key and \(\mathsf {KS}=\mathsf {KS}_{\varvec{s'}\rightarrow \varvec{s},\gamma ,t}\) be a keyswitching secret defined as in Theorem 4.6 such that \(2n(k+1)\ell \beta N\alpha +kNt\gamma +n(1+kN)\epsilon +kN2^{-(t+1)}<\frac{1}{16}\), We denote as \(\textsf {Bootstrap}\left( \varvec{c}\right) \) the output of the bootstrapping procedure described in Algorithm 3 applied to \(\varvec{c}\) with \(\mu _0=0\) and \(\mu _1=\frac{1}{4}\). Let consider two \(\mathsf {LWE}\) samples \(c_1\) and \(c_2\), with message space \(\{0,1/4\}\) and \(\left\| \textsf {Err}(\varvec{c_1})\right\| _\infty , \left\| \textsf {Err}(\varvec{c_2})\right\| _\infty \le \frac{1}{16}\). The result is obtained by computing \(\varvec{\tilde{c}} = (\varvec{0},\frac{5}{8})\text {-}\varvec{c_1}\text {-}\varvec{c_2}\), plus a bootstrapping. Indeed the possible values for the messages of \(\varvec{\tilde{c}}\) are \(\frac{5}{8}, \frac{3}{8}\) if either \(c_1\) or \(c_2\) encode 0, and \(\frac{1}{8}\) if both encode \(\frac{1}{4}\). Since the noise amplitude \(\left\| \textsf {Err}(\varvec{\tilde{c}})\right\| _\infty \) is \(<\frac{1}{8}\), then \(|\varphi _{\varvec{s}}(\varvec{\tilde{c}})|>\frac{1}{4}\) iff. \(\text {NAND}(\textsf {msg}(\varvec{c_1}),\textsf {msg}(\varvec{c_2}))=1\). This explains why it suffices to bootstrap \(\varvec{\tilde{c}}\) with parameters \((\mu _1,\mu _0)=(\frac{1}{4},0)\) to get the answer. By using a similar approach, it is possible to directly evaluate with a single bootstrapping all the basic gates:

  • \(\mathrm {HomNOT}(\varvec{c}) = (\varvec{0},\frac{1}{4})\text {-} \varvec{c}\) (no bootstrapping is needed);

  • \(\mathrm {HomAND}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},-\frac{1}{8})\text {+}\varvec{c_1}\text {+}\varvec{c_2} \right) \);

  • \(\mathrm {HomNAND}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},\frac{5}{8})\text {-}\varvec{c_1}\text {-}\varvec{c_2} \right) \);

  • \(\mathrm {HomOR}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( (\varvec{0},\frac{1}{8})\text {+}\varvec{c_1}\text {+}\varvec{c_2} \right) \);

  • \(\mathrm {HomXOR}(\varvec{c_1},\varvec{c_2}) = \textsf {Bootstrap}\left( 2\cdot (\varvec{c_1}\text {-}\varvec{c_2}) \right) \).

The \(\mathrm {HomXOR}(\varvec{c_1},\varvec{c_2})\) gate can be achieved also by performing \(\textsf {Bootstrap}\left( 2\cdot (\varvec{c_1}\text {+}\varvec{c_2}) \right) \).

4.5 Parameters Implementation and Timings

In this section, we review our implementation parameters and provide a comparison with previous works.

Samples. From a theoretical point of view, our scale invariant scheme is defined over the real torus \(\mathbb {T}\), where all the operations are modulo 1. In practice, since we can work with approximations, we chose to rescale the elements over \(\mathbb {T}\) by a factor \(2^{32}\), and to map them to 32-bit integers. Thus, we take advantage of the native and automatic mod \(2^{32}\) operations, including for the external multiplication with integers. Except for some FFT operations, this seems more stable and efficient than working with floating point numbers and reducing modulo 1 regularly. Polynomials mod \(X^N+1\) are either represented as the classical list of the N coefficients, either using the Lagrange half-complex representation, which consists in the complex (\(2\cdot 64\)bits) evaluations of the polynomial over the roots of unity \(\exp (i(2j+1)\pi /N)\) for \(j\in [\![0,\frac{N}{2}[\![\). Indeed, the \(\frac{N}{2}\) other evaluations are the conjugates of the first ones, and do not need to be stored. The conversion between both representations is done via Fast Fourier Transform (FFT) (using the library FFTW [12], also used by [11]). Note that the direct FFT transform is \(\sqrt{2N}\) lipschitzian, so the lagrange half-complex representation tolerates approximations, and 53 bits of precision is indeed more than enough, provided that the real representative remains small. However, the modulo 1 that can reduce the coefficients of Torus polynomials cannot be applied from the Lagrange representation: we need to perform regular transformations to and from the classical representation. Luckily, it does not represent an overhead, since these conversions are needed anyway, at each iteration of the bootstrapping in order to decompose the accumulator in base \(\varvec{h}\).

Parameters. We take the same or even stronger security parameters as [11], but we adapt them to our notations. We used \(n = 500\), \(N = 1024\), \(k = 1\).

  • \(\mathsf {LWE}\) samples: \(32 \cdot (n+1)\) bits \(\approx \) 2 KBytes.

    The mask of all \(\mathsf {LWE}\) samples (initial and KeySwitch) are clamped to multiples of \(\frac{1}{2048}\). Therefore, the phase computation in the bootstrapping is exact (\(\delta =0\)).

  • \({\mathrm {TLWE}}\) samples: \((k+1) \cdot N \cdot 32\) bits \(\approx \) 8 KBytes.

  • \({\mathrm {TGSW}}\) samples: \((k+1) \cdot \ell \) \({\mathrm {TLWE}}\) samples \(\approx \) 48 KBytes.

    To define \(\varvec{h}\) and \(\text {Dec}_{\varvec{h},\beta ,\epsilon }\), we used \(\ell = 3\), \(B_g = 1024\), so \(\beta =512\) and \(\epsilon =2^{-31}\).

  • Bootstrapping Key: n \({\mathrm {TGSW}}\) samples \(\approx \) 23.4 MBytes.

    We used \(\alpha = 9.0 \cdot 10^{-9}\). Since we have a lower noise overhead, our parameter is higher than the parameter \(\approx 3.25\cdot 10^{-10}\) of [11], (i.e. ours is more secure), but in counterpart, our \({\mathrm {TLWE}}\) key is binary. See Sect. 6 for more details on the security analysis.

  • Key Switching Key: \(k \cdot N \cdot t\) \(\mathsf {LWE}\) samples \(\approx \) 29.2 MBytes.

    we used \(\gamma = 3.05 \cdot 10^{-5}\), \(t = 15\) (The decomposition in the key switching has an precision \(2^{-16}\)).

  • Correctness: The final error variance after bootstrapping is \(9.24.10^{-6}\), by Corollary 4.7. It corresponds to a standard deviation of \(\sigma =0.00961\). In [11], the final standard deviation is larger 0.01076. In other words, the noise amplitude after our bootstrapping is \(<\frac{1}{16}\) with very high probability \(\mathsf {erf}(1/16\sqrt{2}\sigma )\ge 1-2^{-33.56}\) (this is comparable to probability \(\ge 1-2^{-32}\) in [11]).

Note that the size of the key switching key can be reduced by a factor \(n+1=501\) if all the masks are the output of a pseudo random function; we may for instance just give the seed. The same technique can be applied to the bootstrapping key, on which the size is only reduced by a factor \(k+1=2\).

Implementation Tools and Source Code. The source code of our implementation is available on github https://github.com/tfhe/tfhe. We implemented the FHE scheme in C/C++, and run the bootstrapping algorithm on a 64-bit single core (i7-4930MX) at 3.00 GHz. This seems to correspond to the machine used in [11]. We implemented a version with classical representation for polynomials, and a version in Lagrange half-complex representation. The following table compares the number of multiplications or FFT that are required to complete one external product and the full bootstrapping.

 

#(Classical products)

#(FFT + Lagrange repr.)

External product

12

8

Bootstrapping

6000

4006

Bootstrapping in [11]

(72000)

48000

In practice, we obtained a running time of 52ms per bootstrapping using the Lagrange half-complex representation. It is coherent with the 12x speed-up predicted by the table. Profiling the execution shows that the FFTs and complex multiplications are still taking more than 90 % of the total time. Other operations like keyswitch have a negligible running time compared to the main loop of the bootstrapping.

5 Leveled Homomorphic Encryption

In the previous section, we showed how to accelerate the bootstrapping computation in FHE. In this section, we focus on the improvement of Leveled Homomorphic encryption schemes. We present an efficient way to evaluate any deterministic automata homomorphically.

5.1 Boolean Circuits Interpretation

In order to express our external product in a circuit, we consider two kinds of wires: control wires which encode either a small integer or a small integer polynomial. They will be represented by a \({\mathrm {TGSW}}\) sample; and data wires which encode either a sample in \(\mathbb {T}\) or in \(\mathbb {T}_N[X]\). They will be represented by a \({\mathrm {TLWE}}\) sample. The gates we present contain three kinds of slots: control input, data input and data output. In this following section, the rule to build valid circuits is that all control wires are freshly generated by the user, and the data input ports of our gates can be either freshly generated or connected to a data output or to another gate.

We now give an interpretation of our leveled scheme, to simulate boolean circuits only. In this case, the message space of the input \({\mathrm {TLWE}}\) samples will be restricted to \(\{0,\frac{1}{2}\}\), and the message space of control gates to \(\{0,1\}\).

  • The constant source \(\mathtt {Cst}(\mu )\) for \(\mu \in \{0,\frac{1}{2}\}\) is defined with a single data output equal to \((\varvec{0},\mu )\).

  • The negation gate \(\mathtt {Not}(\varvec{d})\) takes a single data input \(\varvec{d}\) and outputs \((\varvec{0},\frac{1}{2})-\varvec{d}\).

  • The controlled And gate \(\mathtt {CAnd}(C,\varvec{d})\) takes one control input C and one data input \(\varvec{d}\), and outputs \(C\boxdot \varvec{d}\).

  • The controlled Mux gate \(\mathtt {CMux}(C,\varvec{d_1},\varvec{d_0})\) takes one control input C and two data inputs \(\varvec{d_1},\varvec{d_0}\) and returns \(C \boxdot (\varvec{d_1}-\varvec{d_0}) + \varvec{d_0}\).

Unlike classical circuits, these gates have to be composed with each other depending on the type of inputs/outputs. In our applications, the \({\mathrm {TGSW}}\) encryptions are always fresh ciphertexts.

figure d

Theorem 5.1

(Correctness). Let \(\mu \in \{0,\frac{1}{2}\}\), \(\varvec{d},\varvec{d_1},\varvec{d_0}\in {\mathrm {TLWE}}_{\varvec{s}}(\{0,\frac{1}{2}\})\) and \(C\in {\mathrm {TGSW}}_{\varvec{s}}(\{0,1\})\).

Theorem 5.2

(Worst-case noise). In the conditions of Theorem 5.1, we have

  • ,

    where .

Proof

The noise is indeed null for constant gates, and negated for the Not gate, which preserves the norm. The noise bound for the CAnd gate is exactly the one from Theorem 3.14, however, we need to explain why there is a max in the CMux formula instead of the sum we would obtain by blindly applying Theorem 3.14. Let \(\varvec{d}=\varvec{d_1}-\varvec{d_0}\), recall that in the proof of Theorem 3.14, the expression of \(C\boxdot \varvec{d}\) is , where \(C=\varvec{z_C}+\mu _C\cdot \varvec{h}\) and \(\varvec{d}=\varvec{z_d}+\mu _d\), \(\varvec{z_C}\) and \(\varvec{z_d}\) are respectively \({\mathrm {TGSW}}\) and \({\mathrm {TLWE}}\) samples of 0, and \(\left\| \varvec{\epsilon _{\text {dec}}}\right\| _\infty \le \epsilon \). Thus, \(\mathtt {CMux}(C,\varvec{d_1},\varvec{d_0})\) is the sum of four terms:

  • of norm \(\le (k+1)\ell N\beta \eta _C\);

  • \(\mu _C \varvec{\epsilon _{\text {dec}}}\) of norm \(\le (kN+1)\epsilon \);

  • \(z_{d_0} + \mu _C(z_{d_1}-z_{d_0})\), which is either \(z_{d_1}\) or \(z_{d_0}\), depending on the value of \(\mu _C\);

  • \(\mu _{d_0}+\mu _C\cdot (\mu _{d_1}-\mu _{d_0})\), which is the output message \(\mu _C\text {?}\mu _{d_1}\text {:}\mu _{d_0}\), and is not part of the noise.

Thus, summing the three terms concludes the proof.    \(\square \)

Corollary 5.3

(Average noise of boolean gates). In the conditions of Theorem 5.1, and in the conditions of Assumption 3.6, we have:

  • ;

  • ;

  • ;

  • ,

    where .

Proof

Same as Theorem 5.2, replacing all norm inequalities by Variance inequalities.    \(\square \)

We now obtain theorems which are analogue to [13], with a bit less noise on the mux gate, but with the additional restriction that CAnd and CMux have a control wire, which must necessarily be a fresh \({\mathrm {TGSW}}\) ciphertext.

The next step is to understand the meaning of this additional restriction in terms of expressiveness of the resulting homomorphic circuits.

It is clear that we cannot build a random boolean circuit, and just apply the noise recurrence formula from Theorem 5.2 or Corollary 5.3 to get the output noise level. Indeed, it is not allowed to connect a data wire to an control input.

In the following section, we will show that we can still obtain the two most important circuits of [13], namely the deterministic automata circuits, which can evaluate any permutation of regular languages with noise propagation sublinear in the word length and the lookup table, which evaluates arbitrary functions with sublinear noise propagation.

5.2 Deterministic Automata

It is folklore that every deterministic program which reads its input bit-by-bit in a pre-determined order, uses less than B bits of memory, and produces a boolean answer, is equivalent to a deterministic automata of at most \(2^B\) states (independently of the time complexity). This is in particular the case for every boolean function of p variables, that can be trivially executed with \(p-1\) bits of internal memory by reading and storing its input bit-by-bit before returning the final answer. It is of particular interest for most arithmetic functions, like addition, multiplication, or CRT operations, whose naive evaluation only requires \(O(\log (p))\) bits of internal memory.

Let \(\mathcal {A}=(Q,i,T_0,T_1,F)\) be a deterministic automata (over the alphabet \(\{0,1\}\), where Q is the set of states, \(i\in Q\) denotes the initial state, \(T_0,T_1\) are the two transitions (deterministic) functions from Q to Q and \(F\subset Q\) is the set of final states. Such automata is used to evaluate (rational) boolean functions on words where the image of \((w_1,\dots ,w_p)\in \mathbb {B}^p\) is equal to 1 iff. \(T_{w_p}(T_{w_{p-1}}(\dots (T_{w_{1}}(i))))\in F\), and 0 otherwise.

Following the construction of [13], we show that we are able to evaluate any deterministic automata homomorphically using only constant and CMux gates efficiently. The noise propagation remains linear in the length of the word w, but compared to [13, Theorem 7.11], we reduce the number of evaluated CMux gates by a factor |w| for a specific class of acyclic automata that are linked to fixed-time algorithms.

Theorem 5.4

(Evaluating Deterministic Automata). Let \(\mathcal {A}=(Q,i,T_0,T_1,F)\) be a deterministic automata. Given p valid \({\mathrm {TGSW}}\) samples \(C_1,\dots ,C_p\) encrypting the bits of a word \(\varvec{w}\in \mathbb {B}^p\), with noise amplitude and , by evaluating at most \(\le p\#Q\) Cmux gates, one can produce a TLWE sample \(\varvec{d}\) which encrypts \(\frac{1}{2}\) iff \(\mathcal {A}\) accepts \(\varvec{w}\), and 0 otherwise such that . Assuming Heuristic 3.6, . Furthermore, the number of evaluated \(\mathtt {CMux}\) can be decreased to \(\le \#Q\). if \(\mathcal {A}\) satisfies either one of the conditions:

  (i) for all \(q\in Q\) (except KO states), all the words that connect i to q have the same length;

  (ii) \(\mathcal {A}\) only accepts words of the same length.

Proof

We initialize \(\#Q\) noiseless ciphertexts \(\varvec{d}_{q,p}\) for \(q \in Q\) with \(\varvec{d}_{q,p}=(\varvec{0},\frac{1}{2})=\mathtt {Cst}(\frac{1}{2})\) if \(q\in F\) and \(\varvec{d}_{q,p}=(\varvec{0},0)=\mathtt {Cst}(0)\) otherwise. Then for each letter of \(\varvec{w}\), we map the transitions as follow for all \(q\in Q\) an \(j\in [\![0,p-1 ]\!]\): \(\varvec{d}_{q,j-1}=\mathtt {CMux}(\varvec{C_j},\varvec{d_{T_1(q),j}},\varvec{d_{T_0(q),j}})\). And we finally output \(\varvec{d_{i,0}}\).

Indeed, with this construction, we have

$$ \textsf {msg}(\varvec{d_{i,0}})= \textsf {msg}(\varvec{d_{T_{w_1}(i),1}})= \ldots =\textsf {msg}(\varvec{d_{T_{w_p}(T_{w_{p-1}}\ldots (T_{w_1}(i))\ldots ),p}}), $$

which encrypts \(\frac{1}{2}\) iff \(T_{w_p}(T_{w_{p-1}}\ldots (T_{w_1}(i))\ldots )\in F\), i.e. iff \(w_1\ldots w_p\) is accepted by \(\mathcal {A}\). This proves correctness.

For the complexity, each \(\varvec{d}_{q,j}\) for all \(q\in Q\) an \(j\in [\![0,p-1 ]\!]\) is computed with a single \(\mathtt {CMux}\). By applying the noise propagation inequalities of Theorem 5.2 and Corollary 5.3, it follows by an immediate induction on j from p down to 0, that for all \(j\in [\![0,p ]\!]\), \(\left\| \textsf {Err}(\varvec{d_{q,j}})\right\| _\infty \le (p-j)\cdot ((k+1)\ell N\beta \eta + (kN+1)\epsilon )\) and \(\textsf {Var}(\textsf {Err}(\varvec{d_{q,j}}))\le (p-j)\cdot ((k+1)\ell N\beta ^2\vartheta + (kN+1)\epsilon ^2)\).

Note that it is sufficient to evaluate only the \(\varvec{d_{q,j}}\) when q is accessible by at least one word of length j. Thus, if the \(\mathcal {A}\) satisfies the additional condition (i), then for each \(q\in Q\), we only need to evaluate \(\varvec{d_{q,j}}\) for at most one position j. Thus, we evaluate less than \(\#Q\) CMux gates in total.

Finally, if \(\mathcal {A}\) satisfies (ii), then we first compute the minimal deterministic automata of the same language (and removing the KO state if it is present), then with an immediate proof by contradiction, this minimal automata satisfies (i), and has less than \(\#Q\) states.     \(\square \)

For sake of completeness, since every boolean function with p variables can be evaluated by an Automata (that accepting only words of length p), we obtain the evaluation of arbitrary boolean function as an immediate corollary, which is the leveled variant of [13, Corollary 7.9].

Lemma 5.5

(Arbitrary Functions). Let f be any boolean function with p inputs, and \(\varvec{c_1},\dots ,\varvec{c_p}\) be p \({\mathrm {TGSW}}_{\varvec{s}}(\{0,1\})\) ciphertexts of \(x_1,\dots ,x_p\in \{0,1\}\), with noise . Then the CMux-based Reduced Binary Decision Diagram of f computes a \({\mathrm {TLWE}}_{\varvec{s}}\) ciphertext \(\varvec{d}\) of \(\frac{1}{2}f(x_1, \dots ,x_p)\) with noise by evaluating \(\mathcal {N}(f)\le 2^p\) CMux gates where \(\mathcal {N}(f)\) is the number of distinct partial functions \((x_l,\dots ,x_p)\rightarrow f(x_1,\dots ,x_p)\) for all \(l\in [\![1,p+1]\!], (x_1,\dots ,x_{l-1})\in \mathbb {B}^{l-1}\).

Proof

(sketch). A trivial automata which evaluates f consists in its full binary decision tree, with the initial state \(i=q_{0,0}\) as the root, each state \(q_{l,j}\) depth \(l\in [\![0,p-1]\!]\) and \(j\in [\![0,2^l-1]\!]\) is connected with \(T_0(q_{l,j})=q_{l+1,2j}\) and \(T_1(q_{l,j})=q_{l+1,2j+1}\), and at depth p, \(q_{p,j}\in F\) iff \(f(x_1,\dots ,x_p)=1\) where \(j=\sum _{l=1}^{p} x_l2^{p-l}\). The minimal version of this automaton has at most \(\mathcal {N}(f)\) states, the rest follows from Theorem 5.4.    \(\square \)

Application: Compilation for Leveled Homomorphic Circuits. We now give an example of how we can map a problem to an automata in order to perform a leveled homomorphic evaluation. We will illustrate this concept on the computation of the p-th bit of an integer product \(a\times b\) where a and b are given in base 2. We do not claim that the automata approach is the fastest way to solve the problem, arithmetic circuits based on bitDecomp/recomposition are likely to be faster. But the goal is to clarify the generality and simplicity of the process. All we need is a fixed-time algorithm that solves the problem using the least possible memory. Among all algorithms that compute a product, the most naive ones are in general the best: here, we choose the elementary-school multiplication algorithm that computes the product bit-by-bit, starting from the LSB, and counting the current carry with the fingers. The pseudocode of this algorithm is recalled in Algorithm 4. The pseudo-code is almost given as a deterministic automata, since each step reads a single input bit, and uses it to update its internal state (xy), that can be stored in only \(M=\log _2(4p)\) bits of memory. More precisely, the states Q of the corresponding automata \(\mathcal {A}\) would be all (j, (xy)) where \(j\in [\![0,j_{\max }]\!]\) is the step number (i.e. number of reads from the beginning) and \((x,y)\in \mathbb {B}\times [\![0,2p[\![\) are the 4p possible values of the internal memory. The initial state is (0, 0, 0), the total number of reads \(j_{\max }\) is \(\le p^2\), and the final states are all \((j_{\max },x,y)\) where y is odd. This automata satisfies condition (i), since a state (jxy) can only be reached after reading j inputs, so by Theorem 5.4, the output can be homomorphically computed by evaluating less than \(\#Q\le 4p^3\) CMux gates, with some O(p) noise overhead. The number of Mux can decrease by a factor 8 by minimizing the automata. Using the same parameters as the bootstrapping key, for \(p=32\), evaluating one Mux gate takes about 0.0002 s, so the whole program (16384 Cmux) would be homomorphically evaluated in 3.2 s.

We mapped a problem from its high-level description to an algorithm using very few bits of memory. Since low memory programs are in general more naive, it should be easier to find them than obtaining a circuit with low multiplicative depth that would be required for other schemes such as BGV, FHE over integers. Once a suitable program is found, as in the previous example, compiling it to a net-list of CMux gates is straightforward by our Theorem 5.4.

figure e

6 Practical Security Parameters

For an asymptotical security analysis, since the phase is lipschitzian, \({\mathrm {TLWE}}\) samples can be equivalently mapped to their closest binLWE (or bin-RingLWE), which in turn can be reduced to standard LWE/ringLWE with full secret using the modulus-dimension reduction [6] or group-switching techniques [13]. It can then be reduced to worst case BDD instances. It is also easy to write a direct and tighter search-to-decision reductions for \({\mathrm {TLWE}}\), or a direct worst-case to average-case reductions from \({\mathrm {TLWE}}\) to Gap-SVP or BDD.

In this section, we will rather focus on the practical hardness of LWE, and express after all the security parameter \(\lambda \) directly as a function of the entropy of the secret n and the error rate \(\alpha \).

Our analysis is based on the work described in [2]. This paper studies many attacks against LWE, ranging from a direct BDD approach with standard lattice reduction, sieving, or with a variant of BKW [4], resolution via man in the middle attacks. Unfortunately, they found out that there is no single-best attack. According to their results table [2, Sect. 8, Tables 7 and 8] for the range of dimensions and noise used for FHE, it seems that the SIS-distinguisher attack is often the best candidate (related to the Lindner-Peikert [17] model, and also used in the parameter estimation of [11]). However, since q is not a parameter in our definition of \({\mathrm {TLWE}}\), we need to adapt their results. This section relies on the following heuristics concerning the experimental behaviour of lattice reduction algorithms. They have been extensively verified and used in practice.

  1. 1.

    The fastest lattice reduction algorithms in practice are blockwise lattice algorithms (like BKZ-2.0 [8], D-BKZ [20], or the slide reduction with large blocksize [14, 20]).

  2. 2.

    Practical blockwise lattice reduction algorithms have an intrinsic quality \(\delta >1\) (which depends on the blocksize), and given a m-dimensional real basis B of volume V, they compute short vectors of norm \(\delta ^m V^{1/m}\).

  3. 3.

    The running time of BKZ-2.0 (expressed in bit operations) as a function of the quality parameter is: \(\log _2(t_{\text {BKZ}})(\delta ) = \frac{0.009}{\log _2(\delta )^2}-27\) (According to the extrapolation by Albrecht et al. [1] of Liu-Nguyen datasets [18]).

  4. 4.

    The coordinates of vectors produced by lattice reduction algorithms are balanced. Namely, if the algorithm produces vectors of norm \(\left\| v\right\| _2\), each coefficient has a marginal Gaussian distribution of standard deviation \(\left\| v\right\| _2/\sqrt{n}\). Provided that the geometry of the lattice is not too skewed in particular directions, this fact can sometimes be proved, especially if the reduction algorithm samples vectors with Gaussian distribution over the input lattice. This simple fact is at the heart of many attacks based on Coppersmith techniques with lattices.

  5. 5.

    For mid-range dimensions and polynomially small noise, the SIS-distinguisher plus lattice reduction algorithms combined with the search-to-decision is the best attack against LWE; (but this point is less clear, according to the analysis of [1], at least, this attack model tends to over-estimate the power of the attacker, so it should produce more conservative parameters).

  6. 6.

    Except for small polynomial speedups in the dimension, we don’t know better algorithms to find short vectors in random anti-circulant lattices than generic algorithms. This folklore assumption seems still up-to date at the time of writing.

If one finds a small integer combination that cancels the mask of homogeneous LWE samples, one may use it to distinguish them from uniformly chosen random samples. If this distinguisher has small advantage \(\varepsilon \), we repeat it about \(1/\varepsilon ^2\) times. Then, thanks to the search to decision reduction (which is particularly tight with our TLWE formulation), each successful answer of the distinguisher reveals one secret key bit. To handle the continuous torus, and since q is not a parameter of \({\mathrm {TLWE}}\) either, we show how to extend the analysis of [2] to our scheme.

Let \((\varvec{a_1},b_1),\dots ,(\varvec{a_m},b_m)\) be either m LWE samples of parameter \(\alpha \) or m uniformly random samples of \(\mathbb {T}^{n+1}\), we need to find a small combination \(v_1,\dots ,v_m\) of samples such that \(\sum v_i \varvec{a_i}\) is small. This condition differs from most previous models, were working on a discrete group, and required an exact solution. By allowing approximations, we may find solutions for much smaller m than the usual bound \(n\log q\), even \(m<n\) can be valid. Now, consider the \((m\text {+}n)\)-dimensional lattice, generated by the rows of the following basis \(B\in \mathcal {M}_{n+m,n+m}(\mathbb {R})\):

figure f

Our target is to find a short vector \(\varvec{w}=[x_1,\dots ,x_n,v_1,\dots ,v_m]\) in the lattice of B, whose first n coordinates \((x_1,\dots ,x_n)=\sum _{i=1}^m v_i \varvec{a_i}\mod 1\) are shorter than the second part \((v_1,\dots ,v_m)\). To take this skewness into account, we choose a real parameter \(q>1\) (that will be optimized later), and apply the unitary transformation \(f_q\) to the lattice, which multiplies the first n coordinates by q and the last m coordinates by \(1/q^{n/m}\). Although this matrix looks like a classical LWE matrix instance, the variable q is a real parameter, and it doesn’t need to be an integer. It then suffices to find a regular short vector with balanced coordinates in the transformed lattice, defined by this basis:

figure g

The direct approach is to apply the fastest algorithm (BKZ-2.0 or slide reduction) directly to \(f_q(B)\), which outputs a vector \(f_q(\varvec{w})\) of standard deviation \(\delta ^{n+m}/\sqrt{n+m}\) where \(\delta \in ]1,1.1]\) is the quality of the reduction.

Once we have a vector \(\varvec{w}\), all we need is to analyse the term \(\sum _{i=1}^m v_i b_i = \sum _{i=1}^m v_i (\varvec{a_i} \varvec{s} + e_i) = \varvec{s}\cdot \sum _{i=1}^m (v_i \varvec{a_i}) + \sum _{i=1}^m v_i \varvec{e_i} = \varvec{s}\cdot \varvec{x} + \varvec{v}\cdot \varvec{e}\).

It has Gaussian distribution of square parameter \(\sigma ^2 = \frac{\delta ^{2(m+n)} \pi }{2q^2} \cdot \frac{n S^2}{m+n} + \frac{q^{2n/m} \delta ^{2(m+n)} \alpha ^2 m}{m+n} = \delta ^{2(m+n)}\left( \frac{\pi S^2}{2q^2} \cdot \frac{n}{m+n} + q^{2n/m} \alpha ^2 \frac{m}{m+n} \right) \). Here \(S = \frac{\left\| \varvec{s}\right\| }{\sqrt{n}} \approx \frac{1}{\sqrt{2}}\). By definition of the smoothing parameter, it may be distinguished from the uniform distribution with advantage \(\varepsilon \) as long as \(\sigma ^2\ge \eta _{\varepsilon }^{2}(\mathbb {Z})\). To summarize, the security parameter of LWE is (bounded by) the solution of the following system of equations

$$\begin{aligned} \lambda (n,\alpha ) = \log _2(t_{\text {attack}}) =\min _{0<\varepsilon <1} \log _2\left( \frac{n}{\varepsilon ^2}t_{\text {BKZ}}(n,\alpha ,\varepsilon )\right) \end{aligned}$$
(3)
$$\begin{aligned} \log _2(t_{\text {BKZ}})(n,\alpha ,\varepsilon ) = \frac{0.009}{\log _2(\delta )^2}-27 \end{aligned}$$
(4)
$$\begin{aligned} \ln (\delta )(n,\alpha ,\varepsilon ) = \max _{\overset{m>1}{\scriptscriptstyle q>1}} \frac{1}{2(m\text {+}n)} \!\left( \ln (\eta _\varepsilon ^2(\mathbb {Z}))-\ln \left( \frac{\pi S^2}{2q^2} \frac{n}{m\text {+}n} + q^{\frac{2n}{m}} \alpha ^2 \frac{m}{m\text {+}n}\right) \!\!\right) \end{aligned}$$
(5)
$$\begin{aligned} \eta _\varepsilon (\mathbb {Z}) \approx \sqrt{\frac{1}{\pi }\ln (\frac{1}{\varepsilon })}. \end{aligned}$$
(6)

Here, Eq. (3) means that we need to run the distinguisher \(\frac{1}{\varepsilon ^2}\) times per unknown key bit (by Chernoff’s bound), and we need to optimize the advantage \(\varepsilon \) accordingly. Equation (4) is the heuristic prediction of the running time of lattice reduction. In Eq. (5) q and m need to be chosen in order to maximize the targeted approximation factor of the lattice reduction step.

Differentiating Eq. (5) in q, we find that its maximal value is

$$ q_{\text {best}}=\left( \frac{\pi S^2}{2\alpha ^2}\right) ^{\frac{m}{2(m+n)}}. $$

Replacing this value and setting \(t=\frac{n}{m+n}\), Eq. (5) becomes:

$$ \ln (\delta )(n,\alpha ,\varepsilon ) = \max _{t>0}\frac{1}{2n} \left( t^2 \ell _2 + t(1-t) \ell _1 \right) \text { where } {\left\{ \begin{array}{ll} \ell _1=\ln \left( \frac{\eta _\varepsilon ^2(\mathbb {Z})}{\alpha ^2} \right) \\ \ell _2=\ln \left( \frac{2 \eta _\varepsilon ^2(\mathbb {Z})}{\pi S^2} \right) . \end{array}\right. } $$

Finally, by differentiating this new expression in t, the maximum of \(\delta \) is reached for \(t_{\text {best}} = \frac{\ell _1}{2(\ell _1 - \ell _2)}\), because \(\ell _1 > \ell _2\), which gives the best choices of m and q and \(\delta \). Finally, we optimize \(\varepsilon \) numerically in Eq. (3).

All previous results are summarized in Fig. 1, which displays the security parameter \(\lambda \) as a function of \(n,\log _2(\alpha )\).

Fig. 1.
figure 1

Security parameter \(\lambda \) as a function of n and \(\alpha \) for LWE samples. This curve shows the security parameter levels \(\lambda \) (black levels) as a function of n = kN (along the x-axis) and \(\log _2(1/\alpha )\) (along the y-axis) for \({\mathrm {TLWE}}\) (also holds for bin-LWE), considering both the attack of this section and the collision attack in time \(2^{n/2}\).

In particular, in the following table we precise the values for the keyswitching key and the bootstrapping key (for our implementation and for the one in [11]).

figure h

The table shows that the strength of the lattice reduction is compatible with the values announced in [11]. Our model predicts that the lattice reduction phase is harder (\(\delta = 1.0055\) in our analysis and \(\delta = 1.0064\) in [11]), but the value of \(\varepsilon \) is bigger in our case. Overall, the security of their parameters-set is evaluated by our model to 136-bits of security, which is larger than the \(\ge 100\)-bits of security announced in [11]. The main reason is that we take into account the number of times we need to run the SIS-distinguisher to obtain a non negligible advantage. Since our scheme has a smaller noise propagation overhead, we were able to raise the input noise levels in order to strengthen the system, so with the parameters we chose in our implementation, our model predicts 194-bits of security for the bootstrapping key and 136-bits for the keyswitching key (which remains the bottleneck).

7 Conclusion

In this paper, we presented a generalization of the \(\mathsf {LWE}\) and \({\mathrm {GSW}}\) homomorphic encryption schemes. We improved the execution timing of the bootstrapping procedure and we reduced the size of the keys by keeping at least the same security as in previous fast implementations. This result has been obtained by simplifying the multiplication morphism, which is the main operation used in the scheme we described. As a proof of concept we implemented the scheme itself and we gave concrete parameters and timings. Furthermore, we extend the applicability of the external product to leveled homomorphic encryption. We finally gave a detailed security analysis. Now the main drawback to make our scheme adapted for real life applications is the expansion factor of the ciphertexts of around 400000 with fairly limited batching capabilities.