Add Valgrind-based detection of variable-latency instructions #693

mkannwischer · 2025-01-24T05:43:45Z

This commit builds on top of
#687
which added Valgrind-based constant-time tests detecting
secret-dependent memory accesses and conditional branches.

This commit extends this to also detect secret-dependent
divisions such as those that lead to the KyberSlash
vulnerability.
See https://kyberslash.cr.yp.to/.
We test for that by using the patch to Valgrind proposed
in the KyberSlash paper (see same link).
It adds an option --variable-latency-errors=no|yes to valgrind
which defaults to no. If you set it to yes, it will complain if
you use undefined in division instructions.

This patch has also been sent to the valgrind developers, but was
so far not merged. Luckily our nix setup makes it very easy to
add a patched valgrind and cache it in the Github cache.

Initially this detected some divisions in the scalar_decompress_x
functions (divisions by powers of two) when compiling with -O0.
Luckily these functions only operate on public ciphertexts
(the ct that comes out of encapsulation, not the ct that that
comes out of re-encryption).
I verified this is the case by explicitly declassifying
the ciphertext that comes out of encapsulation which confirmed
that there are no further divisions.

While this is fine (this ciphertext is public!), it's rather
unnecessary as we really don't need divisions there.
I instead opted to replace those / divisions in the code
by shifts (which is what the compilers do with anything
from -O1 anyway).

The nix setup may be of independent interest as it makes it very easy
to use the patched valgrind (and also a number of different compilers).
You can simply run

nix develop .#ci_valgrind-varlat_gcc14

and you'll magically end up in a shell with a patched valgrind
and gcc14.
Currently we have

ci_valgrind-varlat_clang14
ci_valgrind-varlat_clang15
ci_valgrind-varlat_clang16
ci_valgrind-varlat_clang17
ci_valgrind-varlat_clang18
ci_valgrind-varlat_clang19
ci_valgrind-varlat_gcc48
ci_valgrind-varlat_gcc49
ci_valgrind-varlat_gcc7
ci_valgrind-varlat_gcc11
ci_valgrind-varlat_gcc12
ci_valgrind-varlat_gcc13
ci_valgrind-varlat_gcc14

This commit also adds new workflows to the CI using those shells
and testing our library with the following flags:
{-Oz,-Os,-O3,-Ofast,-O3 -ffastmath,-O2,-O1,-O0}.

mkannwischer · 2025-01-24T08:46:12Z

I temporarily added one of the KyberSlash2 vulnerabilities in bfd3e2a to test this this is actually working as intended.
The test run can be seen here: https://github.com/pq-code-package/mlkem-native/actions/runs/12945822957/job/36109126856?pr=693

It looks something like this:

ERROR > Functional Test    ML-KEM-1024 (native opt):     ==24307== Memcheck, a memory error detector
  ==24307== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
  ==24307== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
  ==24307== Command: test/build/mlkem1024/bin/test_mlkem1024
  ==24307== 
  ==24307== Variable-latency instruction operand of size 8 is secret/uninitialised
  ==24307==    at 0x40AD2C: PQCP_MLKEM_NATIVE_MLKEM1024_poly_compress_d11 (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x402194: PQCP_MLKEM_NATIVE_MLKEM1024_polyvec_compress_du (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40B659: PQCP_MLKEM_NATIVE_MLKEM1024_indcpa_enc (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401EC9: PQCP_MLKEM_NATIVE_MLKEM1024_enc_derand (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401F2F: PQCP_MLKEM_NATIVE_MLKEM1024_enc (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401284: test_keys (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40106D: main (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==  Uninitialised value was created by a client request
  ==24307==    at 0x401D3D: randombytes (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401F1C: PQCP_MLKEM_NATIVE_MLKEM1024_enc (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401284: test_keys (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40106D: main (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307== 
  ==24307== Variable-latency instruction operand of size 8 is secret/uninitialised
  ==24307==    at 0x40AD2C: PQCP_MLKEM_NATIVE_MLKEM1024_poly_compress_d11 (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x402194: PQCP_MLKEM_NATIVE_MLKEM1024_polyvec_compress_du (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40B659: PQCP_MLKEM_NATIVE_MLKEM1024_indcpa_enc (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x4020B1: PQCP_MLKEM_NATIVE_MLKEM1024_dec (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x4012[92](https://github.com/pq-code-package/mlkem-native/actions/runs/12945822957/job/36109126856?pr=693#step:5:97): test_keys (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40106D: main (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==  Uninitialised value was created by a client request
  ==24307==    at 0x401D3D: randombytes (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401F1C: PQCP_MLKEM_NATIVE_MLKEM1024_enc (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x401284: test_keys (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307==    by 0x40106D: main (in /home/runner/work/mlkem-native/mlkem-native/test/build/mlkem1024/bin/test_mlkem1024)
  ==24307== 
  ==24307== 
  ==24307== HEAP SUMMARY:
  ==24307==     in use at exit: 0 bytes in 0 blocks
  ==24307==   total heap usage: 1 allocs, 1 frees, 4,0[96](https://github.com/pq-code-package/mlkem-native/actions/runs/12945822957/job/36109126856?pr=693#step:5:101) bytes allocated
  ==24307== 
  ==24307== All heap blocks were freed -- no leaks are possible
  ==24307== 
  ==24307== For lists of detected and suppressed errors, rerun with: -s
  ==24307== ERROR SUMMARY: 409600 errors from 2 contexts (suppressed: 0 from 0)
  make: *** [Makefile:56: run_func_1024] Error 1

Interestingly this triggers on all versions of gcc (-Os) and clang (-Oz). The former was known and is what we used in the KyberSlash paper, the latter was not known to me and we didn't mention it in the KyberSlash paper.

The scalar decompression routines so far used a division by {16,32,1024,2048}. This is not a security concern since even when compiled into variable-time `div` instructions as these functions are only operating on public ciphertexts (the one that comes out of encapsulation, not the one that comes out of re-encryption). Yet, it appears good style -- and it avoids having to declassify the ciphertext in constant-time tests -- to avoid the division altogether and use a shift instead. Note that since the operands are unsigned, there is no difference between shift and division. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk> Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

This commit builds on top of #687 which added Valgrind-based constant-time tests detecting secret-dependent memory accesses and conditional branches. This commit extends this to also detect secret-dependent divisions such as those that lead to the KyberSlash vulnerability. See https://kyberslash.cr.yp.to/. We test for that by using the patch to Valgrind proposed in the KyberSlash paper (see same link). It adds an option --variable-latency-errors=no|yes to valgrind which defaults to no. If you set it to yes, it will complain if you use undefined values in division instructions. This patch has also been sent to the valgrind developers, but was so far not merged. Luckily our nix setup makes it very easy to add a patched valgrind and cache it in the Github cache. Initially this detected some divisions in the scalar_decompress_x functions (divisions by powers of two) when compiling with -O0. Luckily these functions only operate on public ciphertexts (the ct that comes out of encapsulation, not the ct that that comes out of re-encryption). I verified this is the case by explicitly declassifying the ciphertext that comes out of encapsulation which confirmed that there are no further divisions. While this is fine (this ciphertext is public!), it's rather unnecessary as we really don't need divisions there. I instead opted to replace those / divisions in the code by shifts (which is what the compilers do with anything from -O1 anyway). The nix setup may be of independent interest as it makes it very easy to use the patched valgrind (and also a number of different compilers). You can simply run nix develop .#ci_valgrind-varlat_gcc14 and you'll magically end up in a shell with a patched valgrind and gcc14. Currently we have ci_valgrind-varlat_clang14 ci_valgrind-varlat_clang15 ci_valgrind-varlat_clang16 ci_valgrind-varlat_clang17 ci_valgrind-varlat_clang18 ci_valgrind-varlat_clang19 ci_valgrind-varlat_gcc48 ci_valgrind-varlat_gcc49 ci_valgrind-varlat_gcc7 ci_valgrind-varlat_gcc11 ci_valgrind-varlat_gcc12 ci_valgrind-varlat_gcc13 ci_valgrind-varlat_gcc14 This commit also adds new workflows to the CI using those shells and testing our library with the following flags: {-Oz,-Os,-O3,-Ofast,-O3 -ffastmath,-O2,-O1,-O0}. Signed-off-by: Matthias J. Kannwischer <matthias@kannwischer.eu>

.github/workflows/ct-tests.yml

hanno-becker

This is great, thanks a lot @mkannwischer.

Do we need new CI tests or can we just amend the existing ones?

Also, do you want to remark the existence of those CT tests in the README? Might be worth it.

hanno-becker

Let's get this in. nix cleanup and/or README additions can come separately

mkannwischer changed the title ~~Add Valgrind-based detection of variable-latency instruction~~ Add Valgrind-based detection of variable-latency instructions Jan 24, 2025

mkannwischer force-pushed the kyberslash branch 7 times, most recently from b325192 to bfd3e2a Compare January 24, 2025 08:17

hanno-becker and others added 2 commits January 24, 2025 16:47

mkannwischer force-pushed the kyberslash branch from bfd3e2a to 3456da6 Compare January 24, 2025 08:55

mkannwischer marked this pull request as ready for review January 24, 2025 08:55

mkannwischer requested a review from a team January 24, 2025 08:55

hanno-becker reviewed Jan 24, 2025

View reviewed changes

.github/workflows/ct-tests.yml Show resolved Hide resolved

hanno-becker reviewed Jan 24, 2025

View reviewed changes

hanno-becker approved these changes Jan 24, 2025

View reviewed changes

hanno-becker merged commit ae2cfa1 into main Jan 24, 2025
187 checks passed

hanno-becker deleted the kyberslash branch January 24, 2025 11:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Valgrind-based detection of variable-latency instructions #693

Add Valgrind-based detection of variable-latency instructions #693

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add Valgrind-based detection of variable-latency instructions #693

Add Valgrind-based detection of variable-latency instructions #693

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!