Keywords

1 Introduction

Lattice-based cryptography is a promising candidate for post-quantum cryptography. A key reason for this – especially from an applied point of view – is that it is known how to construct efficient signature and encryption/key-exchange schemes from lattice assumptions. As both primitives are needed for many applications this is an advantage as it allows for code reuse and relying on one sort of cryptographic hardness assumptions instead of two. For all other well-established candidate areas of post-quantum cryptography we only know how to construct efficient and confidence-inspiring signatures or encryption/key-exchange schemes.

In this work we take a look at lattice-based signature schemes. The most efficient lattice-based signature scheme with a security reduction from standard lattice problems today is BLISS (Bimodal Lattice Signature Scheme) [11], designed by Ducas, Durmus, Lepoint and Lyubashevsky. BLISS is one of the few post-quantum schemes of which there already exists a production-level implementation. BLISS (or rather its subsequent improvement BLISS-b [10] by Ducas) is available in the open-source IPsec Linux library strongSwan [24].

BLISS builds on Lyubashevsky’s signature scheme [16] which initiated the use of rejection sampling to make the signature distribution independent of the used secret key. In the most basic version of these schemes, a discrete Gaussian vector is added to a vector that depends on the secret key. The resulting vector follows a discrete Gaussian distribution that is shifted by a vector that depends on the secret key. To avoid leaking the secret key, a rejection step is executed that ensures that the output distribution is independent of the secret key, i.e. the outputs follow again a centered discrete Gaussian distribution.

The use of discrete Gaussian vectors to blind secrets is a very common approach in lattice-based cryptography. However, it is not trivial to sample from a discrete Gaussian efficiently. Over the last few years, many works have been published that deal with efficient sampling routines for discrete Gaussians, see e.g. [8, 11, 12, 19, 21]. Despite the number of publications, none achieved constant-time sampling. At CHES 2016, Groot-Bruinderink, Hülsing, Lange, and Yarom [7] demonstrated that these sampling methods enable a cache attack on BLISS which recovers the secret key after less than 5000 signatures. While the attack is only implemented for two samplers, the appendix of the full version surveys other efficient samplers and shows for each of them that they have similar issues.

In [7], the authors already discuss straightforward approaches for achieving constant-time implementations of discrete Gaussian samplers, such as deterministically loading entire tables into cache or fixing the number of iterations for some functions by introducing dummy rounds. While such approaches might work for encryption schemes such as [5], signatures require much wider Gaussians to achieve security. Hence, the impact on efficiency of applying these countermeasures is larger, effectively rendering their use prohibitive.

A different way to deal with such attacks is to complicate the attack. Such a heuristic approach was proposed by Saarinen [22]. However, this approach does not fix the vulnerability, as shown by Pessl [18]; this only makes it harder to exploit it. In consequence, it starts a cat-and-mouse game of attack and fix.

Our contribution. To stop such a cat-and-mouse game before it fully starts, this work deals with ways to systematically fix the vulnerability. We propose to take a completely different approach by replacing discrete Gaussians by a different distribution, namely the rounded Gaussian distribution. This distribution shares the benefits of the discrete Gaussians that (slightly) shifted distributions are relatively close to centered distributions. However, the security analysis of using rounded Gaussians in Lyubashevsky’s scheme and in BLISS is somewhat more involved than for discrete Gaussians as the probability density function of rounded Gaussians is the integral of a continuous Gaussian. Our main theoretical contribution is a proof that it is safe to replace the discrete Gaussian distribution by a rounded Gaussian distribution in these schemes, while the resulting rejection rates are identical.

As the name suggests, sampling from a rounded Gaussian is done by sampling from a continuous Gaussian and rounding the result to an integer. The Box-Muller method is an efficient way of computing samples of continuous Gaussians starting from uniformly random numbers, and all steps leading to rounded Gaussian samples are efficiently and easily computed in constant time. We present a constant-time implementation of rounded Gaussians suitable for the BLISS-I parameter set and show that it is more than twice as fast as a sampler based on cumulative distribution tables (CDT) as implemented by the authors of BLISS. The CDT sampler uses large precomputed tables to speed up sampling. Note that the CDT sampler is exactly the one that [7] broke at CHES 2016. Using rounded Gaussians brings better speed and better security. Another benefit of rounded Gaussians is that they can use the Box-Muller sampler (see Sect. 4.1) which naturally does not require any precomputed tables, hence can work with a small code base, and furthermore is extremely easy to implement.

We conclude our work with our second theoretical contribution – a proof that using rounded Gaussians, sampled using our Box-Muller implementation is secure. For this we provide a detailed analysis of the new sampler. We study the difference between (perfect) rounded Gaussians and implementations with finite precision p using statistical distance and Rényi divergence. We also compare the asymptotic results using these different measures. We instantiate the calculation for BLISS parameters and the precision achieved by our Box-Muller implementation to derive bounds on the allowable number of signatures per key pair.

Related work. Rounded Gaussians are not a new distribution, in fact they have been used in the initial proposals for learning with errors, but were replaced by discrete Gaussians to make proofs and protocols easier (the sum of two discrete Gaussians is a discrete Gaussian). See, e.g. Regev [20, p. 90] for an overview of distributions and [9] for an analysis of wrapped rounded Gaussians. Encryption schemes can be secure with narrow non-Gaussian distributions (NTRU/Frodo/New Hope) but signatures are much harder to protect, need much wider distributions (larger parameter \(\sigma \)), seemed to need discrete Gaussians, and so far were analyzed only for discrete Gaussians.

The work in this paper is based on Smeets’ masters thesis [23]. After a first version of our paper was circulated we became aware of a recent paper by Micciancio and Walter [17] which has a new proposal to perform sampling of discrete Gaussians in constant time. The target of our paper is very different: Showing that rounded Gaussians can efficiently be sampled in constant time and that their use in signature schemes is safe.

Acknowledgements. The authors would like to thank Jacob Appelbaum for discussions about the implementation; Daniel J. Bernstein for help with the implementation, benchmarking, and many useful discussions; Leo Ducas for discussions about replacing discrete Gaussians in lattice-based signatures; and Marko Boon for discussions about the standard deviation of the rounded Gaussian distribution.

2 Preliminaries

Vectors, considered as column vectors, will be written in bold lower case letters; matrices will be written in upper case bold letters. For a vector \(\mathbf {a} = (a_1, a_2, \dots , a_n) \in \mathbb {R}^n\), the Euclidean norm is defined by \(\Vert \mathbf {a} \Vert = \sqrt{ \sum _{i = 1}^n a_i^2 }\). The \(\infty \)-norm is defined by \(\Vert \mathbf {a} \Vert _\infty = \max \left( |a_1|, |a_2|, \dots , |a_n| \right) \). The Hamming weight \(\mathrm{wt}(\mathbf {a})\) is the number of non-zero positions in \(\mathbf {a}\). For two vectors \(\mathbf {a} = (a_1, a_2, \dots , a_n)\) and \(\mathbf {b} = (b_1, b_2, \dots , b_n)\), both in \(\mathbb {R}^n\), denote the inner product by \(\langle \mathbf {a}, \mathbf {b} \rangle = \sum _{i = 1}^n a_i b_i\).

In this paper we will be concerned with (discrete) probability functions. For a distribution h, we denote by \(x \xleftarrow {\$} h\) that x is sampled according to h. For a set S we denote by \(s \xleftarrow {\$}S\) that \(s\in S\) is sampled uniformly at random from S.

We now cover some background on Gaussian distributions and signature schemes. We follow Lyubashevsky [16] closely and take definitions from there with minor modifications. Many lattice-based schemes use rejection sampling to massage one distribution to fit another. The following m-dimensional version which samples once from the provided distribution and outputs with a certain probability depending on both the target distribution and the sample is copied from Lemma 4.7 of the full ePrint version of [16].

Lemma 2.1

(Rejection Sampling). Let V be an arbitrary set, and \(h: V \rightarrow \mathbb {R}\) and \(f: \mathbb {Z}^m \rightarrow \mathbb {R}\) be probability distributions. If \(g_\mathbf {v}: \mathbb {Z}^m \rightarrow \mathbb {R}\) is a family of probability distributions indexed by \(\mathbf {v} \in V\) with the property

$$ \exists M \in \mathbb {R}: \forall \mathbf {v} \in V : \mathrm{Pr}[M g_\mathbf {v}(\mathbf {z}) \ge f(\mathbf {z}); \mathbf {z} \xleftarrow {\$} f] \ge 1 - \varepsilon , $$

then the distribution of the output of the following algorithm \(\mathcal {A}\):

figure a

is within statistical distance \(\varepsilon /M\) of the distribution of the following algorithm \(\mathcal {F}\):

figure b

Moreover, the probability that \(\mathcal {A}\) outputs something is at least \((1 - \varepsilon )/M\).

2.1 Discrete Gaussian Distribution

The discrete Gaussian distribution is based on the continuous Gaussian distribution. The definition of the continuous Gaussian distribution, also called the Normal distribution, is given by:

Definition 2.1

The continuous Gaussian distribution over \(\mathbb {R}^m\) centered at some \(\mathbf {v} \in \mathbb {R}^m\) with standard deviation \(\sigma \) is defined for \(\mathbf {x} \in \mathbb {R}^m\) as the (joint) density \(\rho _{\mathbf {v},\sigma }^m (\mathbf {x}) = \left( \frac{1}{\sqrt{2 \pi \sigma ^2}}\right) ^m e^{\frac{-\Vert \mathbf {x}-\mathbf {v}\Vert ^2}{2\sigma ^2}}\).

When \(\mathbf {v} = \mathbf {0}\), we simply write \(\rho _\sigma ^m(\mathbf {x})\). The definition of the discrete Gaussian distribution is given by:

Definition 2.2

The discrete Gaussian distribution over \(\mathbb {Z}^m\) centered at some \(\mathbf {v} \in \mathbb {Z}^m\) with parameter \(\sigma \) is defined for \(\mathbf {x} \in \mathbb {Z}^m\) as \(D_{\mathbf {v}, \sigma }^m(\mathbf {x}) = \rho _{\mathbf {v},\sigma }^m (\mathbf {x})/\rho _{\sigma }^m \left( \mathbb {Z}^m \right) \), where \(\rho _{\sigma }^m \left( \mathbb {Z}^m \right) = \sum _{\mathbf {z} \in \mathbb {Z}^m} \rho _\sigma ^m (\mathbf {z})\).

Note that the discrete Gaussian distribution is defined over all length-m integer vectors in \(\mathbb {Z}^m\). However, samples with large entries have negligible probability. Implementations need to provision for the maximal size of coefficients and table-based sampling schemes would require a lot of storage to cover rarely used values and still not cover all possibilities. Therefore, a tail cut \(\tau \) is used, meaning that only integers in \(\left[ -\tau \sigma , \tau \sigma \right] ^m\) are sampled. Results about the necessary size of the tail cut can be found in [16] and Lemma A.2. In practice, \(\tau \) is often chosen as \(\sqrt{2\lambda \ln 2}\), where \(\lambda \) is the security level because that ensures a negligible loss in values.

2.2 Lyubashevsky’s Signature Scheme

In 2012 Lyubashevsky [16] designed a signature scheme that uses an \(m\times n\) matrix \(\mathbf {S}\) with small coefficients as secret key and the following two matrices as public key: a random matrix \(\mathbf {A}\in \mathbb {Z}_{2q}^{n\times m}, m=2n\), and the \(n \times n\) matrix \(\mathbf {T} = \mathbf {AS} \mod q\), where q is an integer. The matrix \(\mathbf {A}\) can be shared among all users, but the matrix \(\mathbf {T}\) is individual. To sign a message, the signer picks a vector \(\mathbf {y}\) according to the m-dimensional discrete Gaussian. Then \(\mathbf {c} = H(\mathbf {A} \mathbf {y} \mod q, \mu )\), where \(H(\mathbf {\cdot })\) is a hash function, and the potential signature vector \(\mathbf {z} = \mathbf {S}\mathbf {c} + \mathbf {y}\) are computed.

The system then uses rejection sampling to shape the distribution of \(\mathbf {z}\) to a centered discrete Gaussian, i.e., to decide whether to output the candidate signature \((\mathbf {z}, \mathbf {c})\). In terms of Lemma 2.1, h is the distribution of \(\mathbf {Sc},g_{\mathbf {v}}\) is the m-dimensional discrete Gaussian \(D_{\mathbf {v},\sigma }(\mathbf {z})\) centered around \(\mathbf {v} = \mathbf {Sc}\), and f is the m-dimensional centered discrete Gaussian \(D_\sigma ^m(\mathbf {z})\).

Because \(D_\sigma ^m(\mathbf {z})\) is independent of \(\mathbf {S}\) the signatures do not leak information about the private key.

2.3 Bimodal Lattice Signature Scheme: BLISS

Ducas, Durmus, Lepoint and Lyubashevsky introduced the Bimodal Lattice Signature Scheme (BLISS) in [11]. BLISS is an improvement of Lyubashevsky’s signature scheme described above in that signatures are smaller and generated faster. We only cover signature generation here as we are focusing on the use of the discrete Gaussian distribution. For a full description of BLISS, see [11]. BLISS uses a special hash function H mapping to \(\{\mathbf {c} \in \{0,1\}^n| \mathrm{wt}(\mathbf {c}) = \kappa \}\) for \(\kappa \) some small constant. A simplified version of the BLISS signature algorithm is given in Algorithm 2.1.

figure c

Given a message \(\mu \), the signing algorithm first samples a vector \(\mathbf {y}\) from the m-dimensional discrete Gaussian distribution \(D_{\sigma }^m\). Then it computes the hash \(\mathbf {c} \leftarrow H(\mathbf {Ay} \bmod 2q, \mu )\). It samples a random bit \(b \in \{ 0,1\}\) and computes the potential signature \(\mathbf {z} \leftarrow \mathbf {y} + (-1)^b \mathbf {Sc}\). Now that the signing algorithm has \(\mathbf {z}\), it performs rejection sampling according to Lemma 2.1, i.e., it outputs the signature \((\mathbf {z}, \mathbf {c})\) with probability , where M is some fixed positive real constant that is set large enough to ensure that this probability is at most 1 for all choices of \(\mathbf {c}\). If the signature algorithm is unsuccessful, it restarts with a fresh \(\mathbf {y}\) and continues until a signature is output.

Again, rejection sampling is used to force the distribution of the output \(\mathbf {z}\) to be that of a centered Gaussian distribution (i.e., to be independent of \(\mathbf {Sc}\)).

The bulk of the time in one round of the signing algorithm using BLISS is spent in the first step in generating m samples from the one-dimensional Gaussian. The number of repetitions depends on M and the size of \(\mathbf {Sc}\).

Bound on \(\Vert {\mathbf {Sc}}\Vert \) . The parameter \(\sigma \) of the discrete Gaussian distribution, the size of \(\mathbf {Sc}\), and the rejection rate M control how much the distributions of the target distribution \(D_{\sigma }^m\) and the input distribution overlap, i.e., how small \(\varepsilon \) can be achieved. For BLISS the input distribution is a bimodal Gaussian distribution \(0.5(D_{-\mathbf {Sc},\sigma }^m +D_{\mathbf {Sc},\sigma }^m)\). BLISS’ authors show that rejection sampling can be used without error, i.e., \(\varepsilon =0\) is possible in Lemma 2.1 with resonable choices of \(\sigma \) and M. In later sections we require an upper bound on \(\Vert \mathbf {Sc} \Vert \) for proofs. In [11] a new measure \(N_\kappa (\mathbf {X})\) of \(\mathbf {S}\), adapted to the form of \(\mathbf {c}\), is presented.

Definition 2.3

For any integer \(\kappa ,N_\kappa : \mathbb {R}^{m \times n} \rightarrow \mathbb {R}\) is defined as:

$$ N_\kappa (\mathbf {X}) = \max _{I \subset \{ 1, \dots , n \}, \# I = \kappa } \sum _{i \in I} \left( \max _{J \subset \{ 1, \dots , n \}, \# J = \kappa } \sum _{j \in J} W_{i,j} \right) , $$

where \(\mathbf {W} = \mathbf {X}^T \cdot \mathbf {X} \in \mathbb {R}^{n \times n}\).

With this definition, the authors of [11] show that for any \(\mathbf {c} \in \{0,1\}^n\) with \(\mathrm{wt}(\mathbf {c}) \le \kappa \), we have \(\Vert \mathbf {Sc} \Vert ^2 \le N_\kappa (\mathbf {S})\) [11, Proposition 3.2]. In addition to the use of bimodal Gaussians, this upper bound lowers the parameter \(\sigma \) by a factor \({\approx }\sqrt{\kappa }/2\) compared to [16].

3 Rounded Gaussian Rejection Sampling

In this section we discuss the applicability of the rounded Gaussian distribution in rejection-sampling-based signature schemes. After giving a formal definition of the rounded Gaussian distribution, we provide proofs showing that it can be used to replace the discrete Gaussian distribution in Lyubashevsky’s signature scheme and in BLISS. We show the analogies between the rounded Gaussian distribution and the discrete Gaussian distribution and we point out where the security reductions differ when rounded Gaussians are used in place of discrete Gaussians. In practice, the most important question is how the probability in Step 5 in Algorithm 2.1 (and the equivalent on in Lyubashevsky’s scheme) needs to change if \(\mathbf {y}\) is sampled according to the rounded Gaussian distribution instead of the discrete Gaussian distribution. Note, again, that this step determines the rejection rate, i.e. how many times the algorithm needs to restart sampling fresh randomness.

To simplify comparisons and show that rounded Gaussians can be used in place of discrete Gaussians we follow the presentation and structure from [16] and [11] very closely. The main difference is that the definition of rounded Gaussians requires integrals over an interval of length 1, while the definition of discrete Gaussians requires a division by the probability mass at all integers. We essentially have to prove the same lemmas that were shown for discrete Gaussians in [16] and [11] for rounded Gaussians. In the end closely analogous results hold but the analysis turns out far more complicated than in the discrete Gaussian setting because we have to deal with bounding integrals.

3.1 Rounded Gaussian Distribution

We now formally define the rounded Gaussian distribution. Intuitively, the rounded Gaussian distribution is obtained by rounding samples from a continuous Gaussian distribution to the nearest integer \(x_i\). To compute the probability at an integer \(x_i\), we compute the integral over the interval \((x_i - \frac{1}{2}, x_i + \frac{1}{2}]\).

Definition 3.1

The rounded Gaussian distribution over \(\mathbb {Z}^m\) centered at some \(\mathbf {v} \in \mathbb {Z}^m\) with parameter \(\sigma \) is defined for \(\mathbf {x}\in \mathbb {Z}^m\) as

$$\begin{aligned} R_{\mathbf {v},\sigma }^m (\mathbf {x}) = \int _{A_\mathbf {x}} \rho _{\mathbf {v}, \sigma }^m (\mathbf {s}) d\mathbf {s} = \int _{A_\mathbf {x}} \left( \frac{1}{\sqrt{2 \pi \sigma ^2}} \right) ^m \exp \left( \frac{- \Vert \mathbf {s} - \mathbf {v} \Vert ^2}{2 \sigma ^2} \right) d\mathbf {s}, \end{aligned}$$

where \(A_\mathbf {x}\) denotes the area defined by \(\left[ x_1 - \frac{1}{2}; x_1 + \frac{1}{2} \right) \times \cdots \times \left[ x_m - \frac{1}{2}; x_m + \frac{1}{2} \right) \).

We point out that this gives us \(\text {vol}(A_\mathbf {x}) = 1\), since the volume of this area is equal to \(|(x_1 + \frac{1}{2}) - (x_1 - \frac{1}{2})| \cdots |(x_m + \frac{1}{2}) - (x_m - \frac{1}{2})|\). Note that the parameter \(\sigma \) in the definition above is the standard deviation of the underlying continuous Gaussian and not the standard deviation \(\sigma '\) of the rounded Gaussian distribution, which is given by \(\sigma ' = \sqrt{\sigma ^2 + \frac{1}{12}}+ \epsilon (\alpha )\), where \(\epsilon (\alpha )\) is some function of small value with mean 0.

3.2 Using Rounded Gaussians in Lyubashevsky’s Scheme

The proofs by Lyubashevsky [16] for the discrete Gaussian distribution rely on several lemmas for which we prove analogous statements in Appendix A. The following lemma states that the centered rounded Gaussian \(R_\sigma ^m (\mathbf {z})\) and the shifted rounded Gaussian \(R_{\mathbf {v},\sigma } (\mathbf {z})\) are almost always close, and Theorem 3.1 applies it to the rejection-sampling Lemma 2.1.

Lemma 3.1

For any \(\mathbf {v} \in \mathbb {Z}^m\), if \(\sigma = \omega (\Vert \mathbf {v} \Vert \sqrt{\log m})\), then

$$ \mathrm{Pr}\left[ R_\sigma ^m (\mathbf {z}) / R_{\mathbf {v}, \sigma }^m (\mathbf {z}) = O(1); \mathbf {z} \xleftarrow {\$} R_\sigma ^m \right] = 1 - 2^{-\omega (\Vert \mathbf {v} \Vert \sqrt{\log m})}. $$

This is proven in Appendix A.

Theorem 3.1

Let V be a subset of \(\mathbb {Z}^m\) in which all elements have norms less than \(T,\sigma \) be some element in \(\mathbb {R}\) such that \(\sigma = \omega (T \sqrt{\log m})\), and \(h : V \rightarrow \mathbb {R}\) be a probability distribution. Then there exists a constant \(M = O(1)\) such that the distribution of the following algorithm \(\mathcal {A}\):

figure d

is within statistical distance \(2^{-\omega (\log m)}/M\) of the distribution of the following algorithm \(\mathcal {F}\):

figure e

Moreover, the probability that \(\mathcal {A}\) outputs something is at least \((1 - 2^{-\omega (\log m)})/M\).

Proof

The proof of this theorem follows immediately from Lemma 3.1 and the general “rejection sampling” Lemma 2.1.    \(\square \)

This theorem looks the same for rounded Gaussians and for discrete Gaussians; see Appendix A.1 for a detailed comparison of the results.

3.3 Using Rounded Gaussians in BLISS

In Sect. 3.2 we have shown that we can use the rounded Gaussian distribution in the rejection sampling scheme by Lyubashevsky [16]. In this section we show how to apply the rounded Gaussian distribution to BLISS and that the same constant as in BLISS can be for rejection sampling.

BLISS randomly flips a bit to decide on adding or subtracting \(\mathbf {Sc}\), i.e., for fixed \(\mathbf {Sc},\mathbf {z}^*\) is distributed according to the bimodal rounded Gaussian distribution \(g_{ \mathbf {Sc}} (\mathbf {z}^*) = \frac{1}{2}R_{\mathbf {Sc}, \sigma }^m (\mathbf {z}^*) + \frac{1}{2}R_{-\mathbf {Sc}, \sigma }^m (\mathbf {z}^*)\). To avoid leaking any information on the secret key \(\mathbf {S}\) the scheme requires rejection sampling to change the bimodal Gaussian to a centered Gaussian \(f(\mathbf {z}^*) = R_{\sigma }^m (\mathbf {z}^*)\). The probability to accept is given by , where again M is chosen minimal such that this probability is \(\le 1\) for all \(\mathbf {z}^*\).

The results of this section are completely analogous to those in [11].

For any \(\mathbf {z}^*\in \mathbb {Z}^m\), we have

$$\begin{aligned} \begin{array}{ll} \mathrm{Pr}[\mathbf {z} = \mathbf {z}^*] &{} = \frac{1}{2}R_{\mathbf {Sc}, \sigma }^m (\mathbf {z}^*) + \frac{1}{2}R_{- \mathbf {Sc}, \sigma }^m(\mathbf {z}^*)\\ &{} = \frac{1}{2}\left( \frac{1}{\sqrt{2 \pi \sigma ^2}}\right) ^m \int _{A_{\mathbf {z}^*}} \exp \left( - \frac{\Vert \mathbf {x} - \mathbf {Sc}\Vert ^2}{2 \sigma ^2} \right) + \exp \left( - \frac{\Vert \mathbf {x} + \mathbf {Sc}\Vert ^2}{2 \sigma ^2} \right) d\mathbf {x}\\ &{} = \exp \left( -\frac{\Vert \mathbf {S} \mathbf {c}\Vert ^2}{2\sigma ^2} \right) \left( \frac{1}{\sqrt{2 \pi \sigma ^2}}\right) ^m \int _{A_{\mathbf {z}^*}} \exp \left( -\frac{\Vert \mathbf {x}\Vert ^2}{2\sigma ^2} \right) \cosh \left( \frac{\langle \mathbf {x}, \mathbf {Sc} \rangle }{\sigma ^2} \right) d\mathbf {x}.\\ \end{array} \end{aligned}$$
(1)

The desired output is the centered rounded Gaussian distribution \(f(\mathbf {z}^*)\), since we need the centered property to avoid leaking \(\mathbf {S}\). Thus by Theorem 3.1, we should accept the sample \(\mathbf {z}^*\) with probability:

$$\begin{aligned}&p_{\mathbf {z}^*} = f(\mathbf {z}^*)/(M g_{\mathbf {Sc}}(\mathbf {z}^*))\\&= \frac{\left( \frac{1}{\sqrt{2 \pi \sigma ^2}}\right) ^m \int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) d\mathbf {x}}{M \exp \left( -\Vert \mathbf {S} \mathbf {c}\Vert ^2/(2\sigma ^2) \right) \left( \frac{1}{\sqrt{2 \pi \sigma ^2}}\right) ^m \int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) \cosh \left( \langle \mathbf {x}, \mathbf {Sc} \rangle /\sigma ^2 \right) d\mathbf {x}}. \end{aligned}$$

To compute a bound on M, we use Eq. (1) and that \(\cosh (x)>0\) for any x. This leads to the following upper bound:

$$\begin{aligned} p_{\mathbf {z}^*}= & {} \frac{\int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) d\mathbf {x}}{M \exp \left( -\Vert \mathbf {S} \mathbf {c}\Vert ^2/(2\sigma ^2) \right) \int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) \cosh \left( \langle \mathbf {x}, \mathbf {Sc} \rangle /\sigma ^2 \right) d\mathbf {x}}\\\le & {} \frac{\int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) d\mathbf {x}}{M \exp \left( -\Vert \mathbf {S} \mathbf {c}\Vert ^2/(2\sigma ^2) \right) \int _{A_{\mathbf {z}^*}} \exp \left( -{\Vert \mathbf {x}\Vert ^2}/(2\sigma ^2) \right) d\mathbf {x}}\\= & {} 1/(M \exp \left( -{\Vert \mathbf {S} \mathbf {c}\Vert ^2}/(2\sigma ^2) \right) ). \end{aligned}$$

Now M needs to be chosen large enough such that \(p_{\mathbf {z}^*} \le 1\). Note that the last inequality can only be used to estimate M, and not to define the probability. It suffices that \(M = \exp \left( 1/(2 \alpha ^2)\right) \)’, where \(\alpha > 0\) is such that \(\sigma \ge \alpha \Vert \mathbf {S} \mathbf {c} \Vert \). We can use the upper bound \(\Vert \mathbf {S} \mathbf {c} \Vert ^2 \le N_\kappa (\mathbf {S})\) as in Definition 2.3 to put \(M=\exp (N_\kappa (\mathbf {S})/2\sigma ^2)\); here \(\kappa \) denotes the sparsity of \(\mathbf {c}\) in Algorithm 2.1. This is the same constant as in BLISS.

3.4 BLISS Security Reduction

The security proof as given in [11] works for the rounded Gaussian distribution with very little tweaking. This is due to the changes made in the proofs in Sect. 3.2 and Appendix A. All statements follow through when replacing the discrete Gaussian distribution with the rounded Gaussian distribution. We do not need to adjust the proofs for [11, Lemmas 3.3 and 3.5]. The proof for [11, Lemma 3.4] uses \(\sigma \ge 3/\sqrt{2\pi }\) which comes from [16, Lemma 4.4]. Our corresponding result is Lemma A.2 which requires \(\sigma \ge \sqrt{2/\pi }\). Next to that, we need to adjust the definitions of \(f(\mathbf {z})\) and \(g_{\mathbf {Sc}}(\mathbf {z})\) as above, such that these match the rounded Gaussian distribution.

4 Practical Instantiation

In this section we discuss how we can implement a sampler for the rounded Gaussian distribution. A very efficient and easy way to generate samples from the continuous Gaussian distribution is based on the Box-Muller transform. We state the algorithm and discuss an early rejection technique to prevent the computation of values which would later be rejected due to the tail cut. Finally, we analyze the output precision required for an implementation of the rounded Gaussian distribution.

4.1 Box-Muller Transform

We begin by reviewing the Box-Muller transform [6] which is used to create centered Gaussian distributed numbers with standard deviation \(\sigma = 1\) from uniform random distributed numbers. The algorithm is given as Algorithm 4.1 below.

figure f

4.2 Sampling Rounded Gaussians

We can now use the Box-Muller transform to create an algorithm for sampling according to the rounded Gaussian distribution. For applying rounded Gaussians to the signature scheme of BLISS, we need centered rounded Gaussians with parameter \(\sigma \). This is done by scaling the output \(x_i\) for \(i = 1,2\) of the Box-Muller sampling scheme \(z'_i = x_i \cdot \sigma \) and then rounding the nearest integer \(z_i=\lfloor z'_i\rceil \).

4.3 Rejection Sampling of Signatures

At the end of Algorithm 2.1 we need to output \((\mathbf {z}, \mathbf {c})\) with probability

$$ \frac{2\int _{A_{\mathbf {z}^*}} \exp \left( -\Vert \mathbf {x}\Vert ^2/(2\sigma ^2) \right) d \mathbf {x}}{M\cdot \exp \left( -\Vert \mathbf {S} \mathbf {c}\Vert ^2/(2\sigma ^2) \right) \left( \int _{A_{\mathbf {z}^*}} \exp \left( - \frac{\Vert \mathbf {x} - \mathbf {Sc}\Vert ^2}{2 \sigma ^2} \right) d\mathbf {x} + \int _{A_{\mathbf {z}^*}} \exp \left( - \frac{\Vert \mathbf {x} + \mathbf {Sc}\Vert ^2}{2 \sigma ^2} \right) d\mathbf {x} \right) } $$

(see Sect. 3.2).

Each of the three integrals factors, i.e., can be computed as the product of one-dimensional integrals. Each one-dimensional integral is

$$ \int _{z_i-1/2}^{z_i+1/2}\exp \left( \frac{-x_i^2}{2\sigma ^2}\right) dx_i =\sigma \sqrt{\frac{\pi }{2}}\left( \mathrm{erf}\left( \frac{z_i+1/2}{\sqrt{2\sigma ^2}}\right) -\mathrm{erf}\left( \frac{z_i-1/2}{\sqrt{2\sigma ^2}}\right) \right) , $$

i.e., a constant times a difference of two nearby values of the standard error function (\(\mathrm{erf}\)).

5 Code Analysis and Benchmarks

This section provides details about our implementation. First we give a general overview over our implementation. Then we discuss the dependency between floating point precision and allowable number of signatures. We end with timings and a comparison to the BLISS CDT sampler.

5.1 Implementation Details

We have used the C++ vector class library VCL by Fog [13] for the implementation of the Box-Muller sampling and the rounded Gaussian sampling. This library offers optimized vector operations for integers, floating point numbers and booleans. We use Vec8d, which are vectors with 8 elements of double floating point precision. This means that we are only limited by the maximum size of the double type, i.e. values of at most 53 bits of precision.

According to [13], the trigonometric and logarithmic functions in VCL have constant runtime, i.e. there is no timing difference dependent on the input. This makes the library ideal for constant-time implementations. The square-root function \(\mathtt{sqrt(\cdot )}\) takes constant time, unless all 8 inputs are in \(\{0,1\}\), which can lead to a timing difference for the square root. However, this is unlikely to happen: the sqrt function is applied to \(2 \ln u_1\) and the logarithm function is strictly positive and thus the case of input 0 cannot appear; the probability of sampling 8 consecutive values \(u_{1i}\) that all would evaluate \(2\ln u_{1i}=1\) is negligible, since each \(u_{1i}\) is sampled from (0, 1] with 53 bit precision, making this an event of probability at most \(2^{-8\cdot 53}\). Therefore we have chosen not to circumvent this problem in the implementation, even though one could also sacrifice a vector entry and force it to have a nontrivial square root computation.

Computing with floating-point numbers causes a drop in precision. While Fog states that operations in VCL lose at most one bit of precision with exception of several explicitly mentioned functions that can lose up to two bits of precision such as the trigonometric functions, a more careful analysis of the code shows that other operations keep (close to) the exact precision.

Sampling rounded Gaussians on top of VCL is only a few lines of code and the data paths are short (see the code listing in Appendix F of the full version [14]). The input of the code has 53 bits of precision and we loose at most 5 bits of precision, i.e. the output of the code has at least \(p = 48\) bits of precision.

Remark 1

We were asked how to round floating point numbers in constant time. While VCL almost trivially rounds the entire vector in constant time, a bit more care is necessary if one wants to implement this on single values. To round \(|A|<2^{51}\) compute

$$ (A+(2^{52}+2^{51}))-(2^{52}+2^{51}) $$

in two arithmetic instructions or use assembly instructions.

5.2 Considerations Regarding the Precision

Samplers for discrete Gaussians typically require tables precomputed at a certain precision. This raises the question of how much a low-precision table can skew the distribution and whether this can lead to attacks. Similarly, floating-point computations, such as in our sampler, can slowly degrade precision.

An error in the computation of y results in a value \(y'\) which might be slightly larger or smaller than y. The magnitude of the error depends on the size of the value, e.g., values close to 0 have higher precision than larger values; in general the error of y is bounded by \(|y|2^{-p}\).

When computing rounded Gaussians, most errors are insignificant because most erroneous values still get rounded to the correct integer. However, errors occurring close to the boundaries of the intervals \([z-\frac{1}{2},z+\frac{1}{2}]\) can lead to wrong outputs. The interval of values that can possibly round to z is given by \([z - \frac{1}{2} - e_{l}, z + \frac{1}{2} + e_{r})\), where the left boundary error satisfies \(|e_{l}| \le 2^{-p} \left| z - \frac{1}{2} \right| \) and the right boundary error satisfies \(|e_{r}| \le 2^{-p}\left| z + \frac{1}{2}\right| \).

We define success for the attacker to mean that he breaks the signature scheme or that he manages to distinguish between the implementation with precision p and a perfect implementation.

Most papers use the statistical distance (Definition B.1) to study the relative difference between two distributions. In [1] the authors showed that studying the Rényi divergence between the distributions can lead to better and tighter estimates.

In this section we work with the known precision \(p=48\) for our implementation and using the parameters for BLISS-I [11], we determine how many signatures an adversary \(\mathcal {A}\) can observe before the Rényi divergence between the ideal implementation and the practical implementation becomes larger than some small constant c; this means, his chance of breaking the system is at most c times as high compared to the ideal implementation.

We also provide an analysis of the asymptotic behavior of the precision p compared to the standard deviation \(\sigma \), the length m and the number of signatures \(q_s\) generated. The computations can be found in Appendix B. These results are naturally less tight because we prioritize readable formulas over best approximations. Accordingly, better results are obtained using numerical computations once one settles on concrete parameters. The asymptotic analysis is helpful in determining which distance or divergence to use.

To analyze the allowable number of signatures \(q_s\) before an attack could possibly distinguish the distributions, we look at the Rényi divergence of order \(\infty \) as given in [1]:

Definition 5.1

For any two discrete probability distributions P and Q, such that \(\mathrm{Supp}\) \((P) \subseteq \) \(\mathrm{Supp}\)(Q), the Rényi divergence of order \(\infty \) is defined by

$$ \mathrm{RD}_\infty (P \mid \mid Q) = \max \limits _{x \in {\mathrm{Supp}}(P)} \frac{P(x)}{Q(x)}. $$

In BLISS using rounded Gaussians we publish m independently sampled integers distributed according to the 1-dimensional rounded Gaussian distribution \(R^1_{\sigma }\) to obtain an m-dimensional vector in \(R_\sigma ^m\). Next to that we assume \(q_s\) signing queries. This means a potential attacker can learn a vector of length \(m q_s\) with entries from the (imprecise) real-world sampler \(R'^1_{\sigma }\). We want to determine the probability that an attacker can distinguish between a vector sampled from \(R_\sigma ^{mq_s}\) and \({R'}^{mq_s}_\sigma \).

By the probability preservation property (Lemma B.2) of the Rényi divergence, any adversary \(\mathcal {A}\) having success probability \(\epsilon \) on the scheme implemented with imprecise rounded Gaussian sampling has a success probability \(\delta \ge \epsilon /\mathrm{RD}_\infty ({R'}_\sigma ^{mq_s} \mid \mid R_\sigma ^{mq_s})\) on the scheme implemented with the perfect rounded Gaussian. For a target success probability \(\epsilon \) we have to choose \(\delta \le \epsilon /\exp (1)\) to have only a small, constant loss in tightness.

We need \(m q_s\) samples to create \(q_s\) signatures. By the multiplicative property of the Rényi divergence (Lemma B.1), we have \(\mathrm{RD}_\infty ({R'}_\sigma ^{mq_s} \mid \mid R_\sigma ^{mq_s}) \le \mathrm{RD}_\infty ({R'}_\sigma ^1 \mid \mid R_\sigma ^1)^{m q_s}\), so we can relate the divergence of the one-dimensional distributions to the \(mq_s\) dimensional one. The formula becomes

The BLISS-I parameters are \(\sigma = 215,m = 2n = 1024\), and \(\epsilon = 2^{-128}\), giving \(\tau = \sqrt{2\cdot 128\ln (2)} = 13.32\), and we work with floating point precision \(p = 48\). We compute \(\mathrm{RD}_\infty \) numerically for the 1-dimensional case with Pari-GP with precision 200 digits, giving \(\mathrm{RD}_\infty ({R'}_\sigma ^1 \mid \mid R_\sigma ^1)\approx 1.0000000000203563\). Recall we want \(\mathrm{RD}_\infty ({R'}_\sigma ^1 \mid \mid R_\sigma ^1)^{mq_s} \le \exp (1)\). For \(m=1024\) we get that \(q_s = 2^{25}\) gives \(2.01262<\exp (1)\). This means that we can create \(2^{25}\) signatures, i.e., 1 signature/min for over 60 years, securely with one key pair. Note also that the choice of \(\exp (1)\) is kind of arbitrary and other constants would be suitable as well. Moreover, provable security continues to degrade slowly after these \(2^{25}\) signatures. As far as we know, no attack is known that would use the distinguishability of the distributions.

Several papers, starting with [1], use Rényi divergence \(\mathrm{RD}_a\) of order a to get much better results regarding the precision. We caution the reader that the relation \(\delta >\epsilon ^{a/(a-1)}/RD_a\), for \(a=2,\delta =2^{-128}\) and constant \(\mathrm{RD}_a=2\), means \(\epsilon =2^{-64}\), which is loose to the point of being meaningless. For the same looseness we could use constant \(2^{64}\) in place of \(\exp (1)\) in \(\mathrm{RD}_\infty \) and sign \(2^{88}\) times.

5.3 Implementation of Rejection Sampling of Signatures

There are many standard numerical techniques and libraries to efficiently compute the complementary error function \(1-\mathrm{erf}\) to high precision. We use the following constant-time mixture of standard techniques: for fixed s, the integral of \(e^{-x^2}\) for x ranging from \(t-s/2\) to \(t+s/2\) is \(e^{-t^2}\) (which we compute in constant time using VCL) times a quickly converging power series in \(t^2\). For the constants \(s=1/\sqrt{2\sigma ^2}\) relevant to BLISS-I through BLISS-IV, and for the entire range of t allowed by our tail cut, the truncation error after five terms of this power series is below the rounding error of double-precision floating-point computation.

Each of the three 1024-dimensional integrals is computed by the bigintegral function shown in Appendix G of the full version [14], which is implemented in just four lines of code, on top of the erfdiff function, which is implemented in just two lines of code, plus a few constants precomputed from \(\sigma \). The rest of the code in Appendix G in [14] is for speeding up VCL’s exp by replacing it with a streamlined fastexp; running a Monte-Carlo sanity check on bigintegral; and benchmarking bigintegral.

Each call to bigintegral takes just 7800 cycles on a Haswell CPU core using g++ 4.8.4 with standard compiler options (-O3 -fomit-frame-pointer -std=gnu++11 -march=native -mtune=native -fabi-version=6), and there are three integrals in the computation of rejection probabilities. (Dividing the integrals and comparing to a random number is equivalent to multiplying the random number by the denominator and comparing to the numerator, which takes constant time.) We save a lot more than these \(3\cdot 7800 = 23400\) cycles in the sampling step (see Table 5.1). Furthermore, the main point of the approach is to produce constant-time implementations, and our code is constant time.

There are only a small number of possible inputs to erfdiff. Specifically, each \(y_i\) is an integer in \([-\tau \sigma ,\tau \sigma ]\), and each entry of \(\mathbf {S}\mathbf {c}\) is an integer bounded in absolute value by \(3\kappa \) for BLISS-I and II and \(5\kappa \) for BLISS-III and IV, so each erfdiff input is an integer bounded in absolute value by \(\tau \sigma +3\kappa \) or \(\tau \sigma +5\kappa \) respectively.

To compute the effects of approximating \(\mathrm{erf}\) and working with finite-precision floating-point numbers we calculated the ratio of the result from our calculation (for all possible erfdiff inputs) to the exact solution, where we used Sage’s arbitrary-precision error_fcn with 1000 bits of precision to very precisely compute the exact solution. The one-dimensional Rényi divergence \(\mathrm{RD}_\infty \) of these distributions is defined as the maximum of these fractions.

For example, in 17 s on a 3.5 GHz Haswell core we calculate for BLISS-I that \(\mathrm{RD}_\infty (\text {approx calculation} \mid \mid \text {exact calculation}) < 1+2^{-46}\).

Using that \(\mathrm{RD}_\infty \) is multiplicative and \((1+2^{-46})^{2^{46}}<\exp (1)\) we get that for \(m=1024\) we can output \(2^{36}\) signatures without the attacker gaining more than a factor of \(\exp (1)\). This is more than the number in Sect. 5.2 so the approximation is sufficiently good.

5.4 Timings for Sampling Rounded Gaussians

Another property that needs to be compared between the rounded Gaussian distribution and the discrete Gaussian distribution is the time it takes to generate one signature. We compare our implementation to the CDT implementation from http://bliss.di.ens.fr/ which is a proof-of-concept, variable-time sampler for discrete Gaussians.

Both the discrete Gaussian and the rounded Gaussian can be used in the BLISS signature scheme as we have shown earlier. We now compare the time that it takes to generate \(m = 1024\) samples by the two sampling schemes. We note that in a full implementation there are more steps to generate a signature, e.g., the rejection step. However, as said before, these steps are not the bottle neck and take approximately equal time for either sampling scheme; thus we do not include them in the analysis.

Our implementation starts by drawing random bits from /dev/urandom and then expanding them using ChaCha20 [3] to 8192 bytes of data. From that 128 vectors of 8 53-bit floating-point variables are initialized with randomness, corresponding to the initial \(u_i\) values in Algorithm 4.1. The rest of the implementation follows closely the description of that algorithm.

Both implementations have been compiled using gcc with -O3. The benchmarks have been run on a Haswell Intel(R) chip, i.e. Intel(R) Xeon(R) CPU E3-1275 v3 3.50 GHz. All values given in Table 5.1 are given in CPU cycles. We give the quartiles Q1 and Q3 and the median over 10 000 runs to show the statistical stability.

Table 1. CPU cycles analysis for the rounded Gaussian sampling scheme and discrete Gaussian sampling scheme with \(m = 1024\) run on Intel(R) Xeon(R) CPU E3-1275 v3 3.50 GHz, stating median and quartiles for 10 000 runs.

In Table 5.1 we can clearly see that the rounded Gaussian implementation is significantly faster than the discrete Gaussian implementation; the rounded Gaussian implementation needs noticeably less than half the number of CPU cycles compared to the discrete Gaussian implementation. We can also see that generating the randomness takes a significant part of the total CPU cycle count.

While the difference in speed is significant we would like to point out that the implementation we used for the discrete Gaussians is not fully optimized. It is hard to predict how much faster a better implementation would be and how much worse the performance would drop if countermeasures to achieve constant-time behavior were implemented.

Our motivation after [7] was to find an alternative to hard-to-secure discrete Gaussians, even if it was slower than current implementations. Our implementation shows that with less than 40 lines of code rounded Gaussians are at least fully competitive.