[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to content
BY 4.0 license Open Access Published by De Gruyter November 17, 2020

One Bit is All It Takes: A Devastating Timing Attack on BLISS’s Non-Constant Time Sign Flips

  • Mehdi Tibouchi EMAIL logo and Alexandre Wallet

Abstract

As one of the most efficient lattice-based signature schemes, and one of the only ones to have seen deployment beyond an academic setting (e.g., as part of the VPN software suite strongSwan), BLISS has attracted a significant amount of attention in terms of its implementation security, and side-channel vulnerabilities of several parts of its signing algorithm have been identified in previous works. In this paper, we present an even simpler timing attack against it. The bimodal Gaussian distribution that BLISS is named after is achieved using a random sign flip during signature generation, and neither the original implementation of BLISS nor strongSwan ensure that this sign flip is carried out in constant time. It is therefore possible to recover the corresponding sign through side-channel leakage (using, e.g., cache attacks or branch tracing). We show that obtaining this single bit of leakage (for a moderate number of signatures) is in fact sufficient for a full key recovery attack. The recovery is carried out using a maximum likelihood estimation on the space of parameters, which can be seen as a statistical manifold. The analysis of the attack thus reduces to the computation of the Fisher information metric.

MSC 2010: 94A60; 62F30; 60D05

1 Introduction

While lattice-based cryptography is a mature research area from a theoretical standpoint, the study of implementation issues related to lattice-based schemes is more recent. In particular, implementation attacks against such schemes and their countermeasures have only emerged as a subject of active research in the past few years, motivated in large part by the push towards postquantum cryptography standardization and deployment epitomized by the NIST competition.

As far as signatures are concerned, the BLISS scheme of Ducas et al. [5] has proved a particularly interesting target. BLISS is possibly the most efficient lattice-based signature scheme so far, comes with a public implementation [6], and has seen deployment in the wild as part of the open source VPN software suite strongSwan [11]. With reasonably small key sizes, fast signatures and strong theoretical security guarantees, it appeared as an especially attractive candidate for practical postquantum signing — until side-channel analysts started poking at it.

It turns out that the various technical ideas that form the basis of BLISS’s efficiency and security all present new challenges with respect to secure implementations. The reliance on discrete Gaussian sampling, in particular, has led to multiple side-channel attacks [4, 10], and so has the rejection sampling in signature generation [3, 7], which is essential for security. The latter has recently been broken even in the presence of certain heuristic countermeasures [2].

One other original aspect of BLISS that hasn’t been specifically targeted in previous attacks is the use of bimodal Gaussians, [1] which manifests itself in the signature algorithm through a random sign flip. In this paper, we analyze to what extent the leakage of that sign flip also leads to a recovery attack on BLISS.

Like most of the attacks cited above, this attack simply takes advantage of the fact that neither the original implementation of BLISS nor the one included in strongSwan are constant-time: these implementations have conditional branches and memory accesses that depend on sensitive variables. In particular, the bit indicating whether or not the sign flip should be carried out during signature generation is sensitive, and these implementations branch on the value of that bit. Doing so reveals the bit in the timing leakage model (which can be concretely achieved through various attack techniques such as instruction cache timing attacks, simple power analysis, branch tracing, etc.), and hence lets us carry our attack. It leads to the full exposure of the signing key given that single bit of leakage for a relatively limited number of signatures: on the order of a few tens of thousands (which is comparable to [2] and [3], despite the fact that those attacks rely on a significantly larger amount of leakage per signature).

The actual recovery of the secret key s is carried out using parametric inference techniques, and specifically a maximum likelihood estimation. One challenge, however, is that the likelihood does not have a maximum if the parameter s is regarded as unconstrained. To find an actual s, we restrict the parameter space to a compact submanifold of ℝn, and specifically a sphere, which we can do since the Euclidean norm of s is a public constant. The likelihood function is a smooth, strictly log-concave function on that manifold, and hence the maximum likelihood estimator s ^ is well-defined. Furthermore, the rate of convergence of s ^ is given by the Fisher information metric on the manifold, which we can compute to derive an estimate on the number of signatures required for recovery.

The countermeasure against this attack is well-known: simply carry out the sign flip in constant-time, using for example constant-time swaps. This is done correctly in the very recent constant-time implementation of BLISS described in [2].

2 Preliminaries

2.1 Notation

Vectors are written in bold letters. The zero vector is denoted by 0, and the identity matrix of dimension is In. The transpose of a matrix A is A. We let ‖v‖, resp. ‖v be the Euclidean, resp. ℓ norm of a vector v. The set of n-dimensional vectors with entries in {0, 1} and hamming weight k is denoted by B κ n For any set S contained in the domain of a function f, we write f ( S ) = x S f ( s )

For a random variable X, we let 𝔼X[X] be its expectation and 𝕍[X] its variance. If X follows a distribution μ, we write X ∼ μ. A sample x for a random variable X ∼ μ is denoted by x ← μ. If μ is a distribution over a set S and f : S → ℝ is any function, we let E X [ X ] = x S f ( x ) μ ( x ) .

The gradient of a differentiable function f : ℝn → ℝ is denoted by grad f.

2.2 Mathematical background

Gaussian measures

The (spherical) Gaussian function centered at c n and “standard deviation” σ > 0 is defined for all x ∈ ℝn as ρσ,c(x) = exp (- − ‖xc2/(2σ2)). When the center is 0, it is usually omitted. The discrete Gaussian over the lattice coset c + ℤn with “standard deviation” σ is then

x c + Z n , D σ , c n ( x ) = ρ σ ( x ) ρ σ ( c + Z n ) .

The continuous Gaussian function restricted to ℤn can have a different behavior than its discrete counterpart for small values of the standard deviation σ. In ourwork, σ is always large enough so that the two distribution have essentially identical behavior (it exceeds the smoothing parameter of ℤ), so that we do not need to make distinction.

Gaussian random variables are known to satisfy strong tail bounds. The following formulation is borrowed from [3, Lemma 4.4].

Theorem 2.1

Let x be a Gaussian vector overn with standard deviation σ. For any t > 0, we have Pr x [ x > t ] 2 n exp ( t 2 2 σ 2 )

Parametric statistics

Let μs be a family of probability distributions parametrized by an element s of some parameter space S. In our context, the mass function μs will typically be the distribution of the output of a cryptographic algorithm depending on a secret data s.

The probability μs(x) that a random variable X ∼ μs takes the value x can be seen as a function of s, called the likelihood function associated with the sample x. It is customary to consider instead its logarithm, the log-likelihood, denoted by ^ x

^ x ( s ) = log μ s ( x ) .

More generally, if x1, . . . , xm are samples from independent, identically distributed random variables X1, . . . , Xm ∼ μs, the probability Pr[∀i, Xi = xi] is simply the product i μ s ( x i ) . We can thus define the log-likelihood function associated with these m samples as:

^ x 1 , , x m ( s ) = i = 1 m log μ s ( x i ) .

When the context is clear, we may omit the corresponding indices.

The log-likelihood can also be seen as a random variable on the same space of outcomes as the random variable X (resp. the random variables X1, . . . , Xm). In that case, we denote it without a hat, as ℓX (resp. ℓX1,...,Xm ).

Given samples x1, . . . , xm from i.i.d. random variables X1, . . . , Xm ∼ μs, a maximum likelihood is a point s on the parameter space where the (log-)likelihood function reaches its maximum (if it exists). A maximum likelihood estimator associated with X1, . . . , Xm ∼ μs is a statistic s ^ m depending on those random variables, taking values in the parameter space S, such that for all samples x 1 , , x m , s ^ m ( x 1 , , x m ) is a maximum likelihood:

s ^ m x 1 , , x m = argmax s ^ x 1 , , x m ( s )

Note that a maximum likelihood estimator is in particular a random variable itself, with values in S.

In case the parameter space S is an open subset of ℝn, and the log-likelihood function satisfies suitable regularity conditions, we can define the Fisher information matrix as the negative expectation of the Hessian matrix of the log-likelihood associated with a single X ∼ μs:

I ( s ) = E X 2 s i s j X ( s )

Then, again under suitable regularity conditions, I(s) is symmetric positive definite, and the following standard theorem holds.

Theorem 2.2

([8, Th. 6.4.1]). Consider a sequence (Xi)i≥1 of i.i.d. random variables Xi ∼ μs. There exists a sequence s ^ m m 1 such that s ^ m is a maximum likelihood estimator associated with X1, . . . , Xm for each m, and that s ^ m converges to s almost surely. Moreover, for any such sequence, the random variable m s ^ m s converges in distribution to N 0 , I ( s ) 1

We refer to [8] and [9, Ch. 6] for more precise statements including all the regularity conditions. In this work, it is easy to check that the regularity conditions are always satisfied. Another interpretation of the above theorem states that maximum likelihood estimators are asymptotically efficient (in the sense that they reach the so-called Cramér-Rao bound, see again [9, Ch. 6]). In other words, they are estimators that requires the less amount of samples to estimate the real result.

One way in which our setting differs, however, is that in our case, the parameter space S is not an open subset of ℝn, but instead a closed submanifold—specifically a sphere. The definition of the Fisher information matrix and Theorem 2.2 generalize naturally to that setting as well. Roughly speaking, the Fisher information matrix becomes a metric tensor on the manifold (which can be seen as a positive definite matrix in the local coordinates in each tangent space). We refer to [1, Ch. 5] for a detailed discussion. In our setting, the log-likelihood function on S is the restriction to S of a smooth function defined over all of ℝn, so its gradient and Hessian are simply obtained as the projection on the tangent space (resp. the restriction to the tangent space) of the gradient and Hessian of the function on all of ℝn.

Representation of ring elements

BLISS is described over cyclotomic rings such as R = Z [ x ] / x n + 1 where n = 2λ for some λ ≥ 1. There are several known representations for elements, and we will make no distinction between them. An element u = i u i x i R can be seen as the integer vector (u0, . . . , un−1). This allows to identify R as the lattice Zn in the ambient inner-product space (ℝn , 〈 , 〉). Elements can also be represented by their matrix of multiplication in the basis 1, x, . . . , xn−1. This allows to see that uv corresponds to the vector whose i-th coordinate is 〈xiu, v〉. The adjoint v* of v is the ring element whose multiplication matrix is the transpose of the matrix of multiplication by v. It is given by v = v 0 v n 1 x v 1 x n 1 , and satisfies 〈u, vw〉 = 〈uv* ,w〉.

2.3 The BLISS scheme

BLISS [5] is a lattice-based signature scheme based on the Ring-LWE assumption. The signature generation (Algorithm 1) involves rejection sampling,which is fundamental toward security. Indeed, it is used to guarantee that the distribution of the signatures does not depend on that of the secret key S = (s1, s2). More precisely, a candidate signature Z = (z1, z2) for a message μ is kept only with probability

(1) 1 M exp S c 2 2 σ 2 cosh Z , S c / σ 2 .

The above probability must be computed at each try to perform the rejection, providing leakage on S if not done in constant-time.

Algorithm 1: Sign(μ, pk = (v1, q − 2), sk = S = (s1, s2))
1: y 1 , y 2 D σ n ;
2: u ζ v 1 y 1 + y 2 mod 2q;
3: c H(ud mod p, μ);
4: Choose a random bit b;
5: z 1 y 1 + ( 1 ) b s 1 c , and z 2 y 2 + ( 1 ) b s 2 c ;
6: continue with probability 1/(Mexp(−‖Sc2/(2σ2)) cosh(〈Z, Sc〉/σ2); else restart;
z 2 u d u z 2 d 7: mod p;
8: return z 1 , z 2 , c ;

During the key generation algorithm, s1 and s2 are sampled uniformly among the set of n-dimensional vectors with δ 1 n entries in {±1} and ⌈δ2n⌋ entries in {±2} for some small densities δ1, δ2 ∈ (0, 1). The rest of the entries are zero. The first component of the public key is defined as v 1 = 2 s 2 + 1 s 1 1 mod q. The element ζ satisfies ζ · (q − 2) = 1 mod 2q. The authors of [5] model the hash function H of Step 3 as a random oracle with output in B κ n , for some small parameter [2] k > 0. After rejection, it can be shown that, without additional information, the distribution of z1 is indistinguishable from that of a bimodal Gaussian distribution with standard deviation parameter σ.

3 One bit is all it takes

Observe that to break the scheme, it is enough to recover the s1 part of the signature, as s2 can be directly obtained from the knowledge of s1 and the public key. Hence, we write s = s1 and focus on this component for the rest of this section. Accordingly, we also write z = z1. BLISS key-generation and signature consider only spherical Gaussians distribution for their sampling. In particular, all the coordinates of the generated samples are independent. This allows to rewrite the rejection probability in term of any target subset of coordinates for S and Z, so we can focus on s and z by considering accordingly their rejection probability.

Then, our attack boils down to the fact that knowing b allows to express the log-likelihood functions for m samples of the distribution of the output of Algorithm 1. We start by calculating it explicitly and deduce that, on a n-dimensional sphere of radius s = δ 1 n + 4 δ 2 n (which is known), it admits a unique maximum likelihood estimator.

This calculation also enables us to express the Fisher information matrix I(s). We then assess the number of needed traces to recover (almost all of) s with high probability, relying first on a heuristic analysis of the behavior of I(s) for the sake of clarity. In Appendix A, we give more rigorous justifications for our heuristic arguments. The final result is then obtained combining Theorem 2.2 and the Gaussian tail bound. Algorithmic and practical aspects of the attack are dealt with in Section 4.

3.1 Existence of the maximum likelihood estimator

Let A be the distribution of (b, z, c) outputted by Algorithm 1, and consider (b, z, c) 𝒜 as well. From Step 5 it is clear that before rejection sampling, z is distributed as

D σ , ( 1 ) b s c n ( z ) = 1 ρ σ Z n exp z + ( 1 ) b s c 2 2 σ 2 .

Combining with the rejection probability (1), we readily see that the probability to get a specific output (b, z, c) is

(2) μ s ( b , z , c ) = 2 exp z 2 2 σ 2 ρ σ Z n exp ( 1 ) b z , s c σ 2 exp z , s c σ 2 + exp z , s c σ 2 .

The log-likelihood function associated to one sample (b, z, c) is then ^ ( b , z , c ) ( s ) = K log 1 + exp 2 z , s c / σ 2 , where K is constant with respect to s. Similarly, the log-likelihood function associated to m samples (b(i), z(i), c(i)) 𝒜 is:

^ b ( i ) , z ( i ) , c ( i ) i ( s ) := K i = 1 m log 1 + exp 2 σ 2 ( 1 ) b z ( i ) c ( i ) , s .

where again, K is a constant independent of s. As ‖s‖ is known from the description of the scheme, a natural set where to maximize ^ b ( i ) , z ( i ) , c ( i ) i is the sphere S of radius ‖s‖ in ℝn. Since ^ b ( i ) , Z ( i ) , c ( i ) i is clearly continuous on this compact set, it is bounded and reaches its maximum on S. We can see moreover that the point at which this maximum is necessarily unique, and therefore determines a unique, well-defined maximum likelihood estimator s ^

Indeed, let φ ( t ) = log 1 + exp 2 t σ 2 Straightforward computations give:

φ ( t ) = 2 σ 2 1 + exp 2 t / σ 2  and  φ ( t ) = 1 σ 4 cosh t / σ 2 2 .

This implies that for any w ∈ ℝn, φ(〈w, s〉) is a strictly concave function of s, and therefore so is ^ b ( i ) , z ( i ) , c ( i ) i . It follows that the maximum of ^ is reached at a unique point of S as stated.

3.2 Determining the number of required traces

For any fixed w ∈ ℝn, we readily compute 2 s j s k ^ = φ ( w , s ) w j w k for any j, k ∈ [n], where wj is the j-th coordinate of w. Let now (b, z, c) A and define w = ( 1 ) b z c . We can express the Fisher information matrix as

I ( s ) = E w φ ( w , s ) w j w k j , k ,

where w follows a distribution to be fully described later. Nevertheless, it is checked from its definition that all coordinates of w are mutually independent variables. Now let w be the vector expected for w. Using the independence of the coordinates of w, we have

I ( s ) j , k = E w φ ( w , s ) w ¯ j w ¯ k  if  j k E w φ ( w , s ) w ¯ j 2 + V w j  if  j = k .

Since ( 1 ) b z D σ , s c n and since c* has exactly k non zero entries, we see that 𝕍[wj] = 2 for all j. We deduce that

I ( s ) = E w cosh w , s σ 2 2 1 σ 4 κ σ 2 I n + w ¯ w ¯ .

Heuristic analysis of I(s)

Our goal is now to argue that I(s)−1 essentially behaves like a scalar matrix (σ2/k)In. In order to do this, we need to know the distribution of w. We show in Appendix A that it is close to N ( κ s , σ κ ) . For the rest of the current section, we therefore assume that w follows that particular normal law.

Under this assumption, we can write 〈w, s〉 = ks2 + 〈h, s〉. As the non-zero coordinates of s are sampled uniformly in centered sets, we expect that is has essentially the same number of positive and negative coordinates. More precisely, potentially "problematic" s’s (with an imbalanced number of positive or negative

coordinates) exist only in negligible proportion. As the coordinates of h are at most σ κ with overwhelming probability, we expect |h, s〉| to be within a small factor from σ κ .

Combined with a look at usual BLISS parameters, this suggests that 〈w, s〉/σ2 is likely to be close to 0 for most of w’s and a fixed s. We deduce that

I ( s ) κ σ 2 I n + 1 κ σ 2 W ¯ W ¯ .

The rank-1 matrix ww has a unique non-zero eigenvalue given by w ¯ w ¯ = κ 2 s 2 . This means that 1 κ σ 2 W ¯ W ¯ has operator norm smaller than 1, which shows that I(s) is invertible. As σ2 is at least an order of magnitude larger than k‖s2 for BLISS parameters, this is enough to argue that I(s)−1 is close to (σ2/k)In.

Recall that S ^ is the maximum likelihood estimator associated to m samples from A. From Theorem 2.2, we infer that for m large enough, s ^ s is a centered Gaussian vector of covariance (σ2/m)In. We do not consider the speed of convergence and assume that the result is “valid enough” for the ranges of m that we are interested in. Then, Theorem 2.1 states that s ^ s 1 / 2 except with probability less than 2n exp(−km/8σ2). Therefore, taking m ≥ 16 log(2n2/k gives that s ^ s 1 / 2 except with probability at most 1/2n. As S ^ is essentially an optimal estimator, we remark that it should not be possible to improve the number of required samples more than marginally.

4 Implementation of the attack

In this section, we describe how to mount the attack in practice from an algorithmic standpoint given the required leakage, and provide some experimental results. We also discuss to what extent the publicly available implementations of BLISS are vulnerable to this attack. The code for the attack can be found at https://github.com/awallet/OneBitBliss.

4.1 Algorithmic considerations

Given m signatures together with the corresponding leakage bits (b(i), z(i), c(i)), we can form the vectors w(i) = (−1)bz(i)(c(i))*. As seen in Section 3, the log-likelihood function ^ and its gradient are expressed as follows:

^ ( s ) = i = 1 m φ w ( i ) , s  and  grad ^ ( s ) = i = 1 m φ w ( i ) , s w ( i )

when they are seen as functions on ℝn. It is therefore straightforward in principle to evaluate them numerically given the available data, and hence use an algorithmic such as gradient descent (or rather, gradient ascent) to search for the corresponding maximum likelihood estimator S ^ .

Recall however that we are doing maximization on the sphere centered at 0 and of radius ‖s‖. Hence, the gradient ascent should also be carried out on the sphere, along geodesic paths. Concretely, each iteration of the gradient ascent is carried out as follows.

  1. Starting from a point v on the sphere, we compute ^ℓ(v) as well as the gradient g = grad ^ ( v ) at v as above. We then compute the projection h of g on the tangent plane v at v as h = g g , v s 2 v .

  2. The vector h gives the direction of steepest ascent for ^ on the sphere, and the geodesic through v in the direction of h is parametrized by v(θ) = cos θ · v + sin θ · h. We take a step along that geodesic using backtracking line search as described in the next steps. Note that the first order Taylor expansion around θ = 0 gives:

^ v ( θ ) = ^ ( cos θ v + sin θ h ) = ^ ( v + θ h + o ( θ ) ) = ^ ( v ) + θ g , h + o ( θ ) = ^ ( v ) + h 2 θ + o ( θ ) .
  1. Initialize θ to θ0 and compute 0 = ^ v θ 0 . If the “Armijo condition” 0 ^ ( v ) + h 2 θ / 2 is satisfied (which, by the above, must happen for θ small enough), keep v = v(θ0) as the result for this iteration of the gradient ascent. Otherwise, update θ ← νθ for some constant ν < 1 and try again.

This gradient ascent procedure is then repeated for a certain number of iterations or until ‖h‖ becomes sufficiently small. It is standard that the concavity of ^ ensures the convergence to the maximum.

We also need to specify an initial vector v to initialize the gradient ascent. For this, we use the fact that the maximum likelihood S ^ for a single sample w is exactly λw, where λ is the positive normalization factor needed to obtain a point on the sphere. This is because φ is an increasing function, and hence the maximum w , s ^ . likelihood is obtained by maximizing the inner product Thus, as a first approximation for the maximum likelihood of m samples w(i), we simply pick the vector v = λ i w ( i ) , where λ is again chosen so that v in on the sphere. We find that this choice gives good results (although in principle, convergence could be achieved from any starting point on the sphere anyway).

4.2 Experimental results

We implemented the algorithm of the previous section and ran it on samples generated by the original implementation of BLISS, together with the leakage of the bit b. The source code of the attack is attached to this submission as supplementary material.

For each of the BLISS parameters BLISS–I to BLISS–IV, we generated 100 fresh key pairs, together with corresponding samples in batches of 20,000, and counted for each key pair how many batches where needed for the attack of the previous section to succeed and fully recover the secret key. The lower quartile (LQ), median, and upper quartile (UQ) numbers of signatures m are all reported in Table 1. We also counted how many batches allowed to recover π = 504 out of the 512 secret key coefficients, since such a recovery can then be combined with a simple meet-in-the-middle attack of low complexity (namely n n / 2 n / 2 2 36 time and n / 2 n / 2 2 27 space) to deduce the exact secret key.

Table 1

Results of our experiments.

BLISS– I II III IV
Theoretical m for success: 16 log(2n2/k 223,000 55,000 231,000 209,000
Experimental m for full recovery (LQ) 120,000 60,000 160,000 170,000
Experimental m for full recovery (median) 130,000 70,000 180,000 200,000
Experimental m for full recovery (UQ) 150,000 80,000 200,000 230,000
Experimental m for n/n recovery (LQ) 70,000 40,000 90,000 110,000
Experimental m for n/n recovery (median) 70,000 40,000 100,000 110,000
Experimental m for n/n recovery (UQ) 80,000 40,000 110,000 120,000

As we can see from the table, our analysis predicts the right order of magnitude for m, even though our gradient ascent only gives an approximation of the maximum likelihood, and despite the various heuristics involved (in particular, ignoring the fact that the maximum likelihood is only unbiased in the limit and not necessarily for a bounded number of samples). In addition, combining our approach with a meet-in-the-middle attack does significantly reduce the number of required signatures for successful recovery.

Finally, we note that these results all consider the attack on the first component z1 of the signature. In principle, the attack on the z2 component should be more efficient, due to the fact that the coefficients of s2 (except possibly the first one) are all in 2Z, and hence obtaining an estimator ( s ^ 2 with s ^ 2 s 2 < 1 instead of 1/2) suffices for recovery, which reduces the required number of signatures by a factor of 4. In practice, however, this is impractical, due to the fact that in actual BLISS signatures, the second component z2 is not given out in full, but only in compressed form z 2 , and the compression is lossy—it only allows to recover an approximation of the corresponding samples w. For most parameter sets, the approximation is rather loose, and this makes the attack on the second component significantly worse than the attack on z1. For BLISS–IV, however, the compression is not so lossy, and it turns out that the attack on z2 is in fact measurably better despite the compression (with full recovery possible given 40,000 to 50,000 signatures).

4.3 Vulnerability of concrete implementations

The part of the signing algorithm corresponding to the sign flip is implemented as shown in Fig. 1 in the various available implementations of BLISS, namely the original one by Ducas and Lepoint [6], the implementation in strongSwan [11] and the recent constant-time implementation described in [2].

Figure 1 Existing implementations of the sign flip in BLISS.
Figure 1

Existing implementations of the sign flip in BLISS.

Both the original BLISS implementation (as shown in Fig. 12) and strongSwan (Fig. 14) share the same structure, and essentially branch on the bit b indicating the sign flip. On most platforms, additions and subtractions have the same running time, and thus the running time of the corresponding computation is the same regardless of the bit b. Nevertheless, these implementations do not qualify as constant-time, because they contain a conditional branch on the sensitive value b. This can be exploited in practice using side-channels such as a cache timing attack on the instruction cache (provided that the instructions corresponding to the two branches end up in different cache lines) or the branch tracing attack described in [7]. Branch tracing breaks both implementation equally easily; cache timing attacks should be easier to mount on strongSwan, however, since the branch is taken in every iteration of the for loop instead of just once in Fig. 12 (although compiler optimizations might affect this distinction in practice).

In contrast, the implementation of Fig. 11, described in [2], is unaffected by this attack, since the sign flip is carried out using a constant-time expression. More precisely, The CFLIP(x, b) macro essentially expands to (x&(-!b))^((-x)&(-b)), and hence is computed using only bitwise operations, without any conditional branches.

References

[1] Nihat Ay, Jürgen Jost, Hông Vân Lê and Lorenz Schwachhöfer, Information Geometry, Springer, 2017.Search in Google Scholar

[2] Gilles Barthe, Sonia Belaïd, Thomas Espitau, Pierre-Alain Fouque, Mélissa Rossi and Mehdi Tibouchi, GALACTICS: Gaussian Sampling for Lattice-Based Constant- Time Implementation of Cryptographic Signatures, Revisited, in: ACM CCS 2019 (Lorenzo Cavallaro, Johannes Kinder, XiaoFeng Wang and Jonathan Katz, eds.), pp. 2147–2164, ACM Press, November 2019.10.1145/3319535.3363223Search in Google Scholar

[3] Jonathan Bootle, Claire Delaplace, Thomas Espitau, Pierre-Alain Fouque and Mehdi Tibouchi, LWE Without Modular Reduction and Improved Side-Channel Attacks Against BLISS, in: ASIACRYPT 2018, Part I (Thomas Peyrin and Steven Galbraith, eds.), LNCS 11272, pp. 494–524, Springer, Heidelberg, December 2018.10.1007/978-3-030-03326-2_17Search in Google Scholar

[4] Leon Groot Bruinderink, Andreas Hülsing, Tanja Lange and Yuval Yarom, Flush, Gauss, and Reload - A Cache Attack on the BLISS Lattice-Based Signature Scheme, in: CHES 2016 (Benedikt Gierlichs and Axel Y. Poschmann, eds.), LNCS 9813, pp. 323–345, Springer, Heidelberg, August 2016.10.1007/978-3-662-53140-2_16Search in Google Scholar

[5] Léo Ducas, Alain Durmus, Tancrède Lepoint and Vadim Lyubashevsky, Lattice Signatures and Bimodal Gaussians, in: CRYPTO 2013, Part I (Ran Canetti and Juan A. Garay, eds.), LNCS 8042, pp. 40–56, Springer, Heidelberg, August 2013.10.1007/978-3-642-40041-4_3Search in Google Scholar

[6] Léo Ducas and Tancrède Lepoint, BLISS: Bimodal Lattice Signature Schemes, June 2013, http://bliss.di.ens.fr/bliss-06-13-2013.zip (proof-of-concept implementation).Search in Google Scholar

[7] Thomas Espitau, Pierre-Alain Fouque, Benoît Gérard and Mehdi Tibouchi, Side-Channel Attacks on BLISS Lattice-Based Signatures: Exploiting Branch Tracing against strongSwan and Electromagnetic Emanations in Microcontrollers, in: ACM CCS 2017 (Bhavani M. Thuraisingham, David Evans, Tal Malkin and Dongyan Xu, eds.), pp. 1857–1874, ACM Press, October / November 2017.10.1145/3133956.3134028Search in Google Scholar

[8] Robert V. Hogg, Joseph W. McKean and Allen T. Craig, Introduction to Mathematical Satistics (8th edition), Pearson, 2018.Search in Google Scholar

[9] Erich L. Lehmann and George Casella, Theory of Point Estimation, Springer, 1998.Search in Google Scholar

[10] Peter Pessl, Leon Groot Bruinderink and Yuval Yarom, To BLISS-B or not to be: Attacking strongSwan’s Implementation of Post-Quantum Signatures, in: ACM CCS 2017 (Bhavani M. Thuraisingham, David Evans, Tal Malkin and Dongyan Xu, eds.), pp. 1843–1855, ACM Press, October / November 2017.10.1145/3133956.3134023Search in Google Scholar

[11] Andreas Steffen et al., strongSwan: the Open Source IPsec-based VPN Solution (version 5.5.2), March 2017.Search in Google Scholar

A Analyzing the distribution of w

Recall that we defined the random variable w = ( 1 ) b z c from the random vector (b, z, c) ← 𝒜, where 𝒜 is the distribution of the output of Algorithm 1. If there was no rejection sampling, one could read the distribution of y from Step 5, as y = ( 1 ) b z + s c Since y is large compared to sc, the rejection step makes y a centered Gaussian with covariance σ2, so that yc* should have covariance σ2k. This suggests that w should be centered at scc*, with the same covariance. We make this heuristic argument rigorous in the following.

From Equation (2), we can write

E b , z [ ( 1 ) b z | c ] = 1 ρ σ ( Z n ) z Z n z μ s ( 0 , z , c ) 1 ρ σ ( Z n ) z Z n z μ s ( 1 , z , c ) = 1 ρ σ ( Z n ) z Z n z exp ( z 2 2 σ 2 ) exp ( n z , s c σ 2 ) exp ( n z , s c σ 2 ) exp ( n z , s c σ 2 ) + exp ( n z , s c σ 2 ) = E z [ z tanh ( n z , s c σ 2 ) | c ] ,

where z D σ , Z n on the last line.

We now claim that this expected vector is essentially sc. For the sake of clarity, we consider that z follows a normal law of standard deviation σ. Using the change of variables z = σu, we can write

(3) E b , z [ ( 1 ) b z | c ] = σ ( 2 π ) n R n u exp ( u 2 2 ) tanh ( n σ 1 u , s c ) d u .

Take any orthonormal basis of ℝn whose first vector is v = sc/‖sc‖, and call Ei the i-th coordinate of Eb,z[(1)bz|c] in this basis. Fubini’s theorem gives

E 1 = σ ( 2 π ) n R u exp ( u 2 2 ) tanh ( u s c σ ) d u ( R exp ( t 2 2 ) d t ) n 1 = s c 2 π R exp ( u 2 2 ) cosh ( u s c σ ) 2 d u ,

where the second line uses an integration by parts. Letting ε = s c σ and using that 1 cosh ( t ) exp ( t 2 / 2 ) for all t, a straightforward computation shows that E 1 s c [ ( 1 + 2 ε 2 ) 1 , 1 ]

Fubini’s theorem also allows to see that each other coordinates involves a factor ∫ tanh(usc‖/σ)exp(−u2/2)du. Observe that the function to be integrated in this factor is odd, so that 𝔼i = 0 for all i ≥ 2. For BLISS parameters, σ is an order of magnitude larger than ‖sc‖; in other words, ϵ is small. This gives the claim.

For a fixed s, we deduce that E w [ w ] = s E c [ c c ] Our next goal is to show that the random vector cc* has a center close to (κ, 0, . . . , 0) ∈ ℝn. We observe immediately that (cc*)1 = ‖c2 = κ. For a fixed c B κ n let χ the indicator function of its support. By standard algebraic manipulations, we find in general

( c c ) i + 1 = x i c , c = k = 0 i 1 χ ( n + k i ) χ ( k ) + k = i n 1 χ ( k i ) χ ( k ) .

Assuming that the coordinates of c are mutually independent [3], taking the expectation for i ≥ 1 yields E b f c [ c c ] i + 1 = i κ 2 n 2 + ( n i ) κ 2 n 2 = κ 2 n 2 ( n 2 i ) This shows that E c [ c c ] ( κ , 0 , , 0 ) κ 2 / n which is practically small for any proposed parameters for BLISS. This confirms that the center of w is close to ks.

Finally, we show that the covariance matrix of w is essentially σ2k In.We readily see that the (i, j)-th entry in the covariance matrix of (-1)bz is

V i j := E b , z i [ z i z j | c ] ( s c ) i ( s c ) j .

Rejection sampling makes an output z a centered discrete Gaussian of standard deviation σ. As z’s coordinates are mutually independent, we get 𝕍ij = (sc)i(sc)j if ij and 𝕍ii = σ2 - (sc)i(sc)j. Recall that the coordinates of sc have magnitude at most k, which is already a very pessimistic estimation. Hence σ2Ii. As c* has exactly κ non-zero entries, we get the desired result on 𝕍[w].

Received: 2019-06-05
Accepted: 2019-07-01
Published Online: 2020-11-17

© 2020 M. Tibouchi and A. Wallet, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 22.12.2024 from https://www.degruyter.com/document/doi/10.1515/jmc-2020-0079/html
Scroll to top button