FIPS-compliant random numbers for the kernel
The Linux random-number generator (RNG) seems to attract an outsized amount of attention (and work) for what is, or seemingly should be, a fairly small component of the kernel. In part that is because random numbers, and their quality, are extremely important to a number of security protections, from unpredictable IP-packet sequence numbers to cryptographic keys. A recent post of version 43 of the Linux Random Number Generator (LRNG) by Stephan Müller is not likely to go any further than its predecessors, but the discussion around it may lead to support for a feature that some distributions need.
The cover
letter for the LRNG patch set is titled "/dev/random - a new
approach
", which is true, but also sure to elicit highly skeptical
responses or cause the patches to be ignored entirely. As was reiterated in the
discussion, kernel development generally does not proceed along the
"wholesale replacement" path; features are added slowly, in bite-sized
chunks, instead. But LRNG is meant to be a drop-in replacement for the
existing kernel RNG, while adding a long list of additional features—some
of which would likely be welcome if they were separated out.
Müller pointed to a set of presentation slides for a good overview of LRNG. One area where the kernel RNG has had difficulties over the years is in gathering enough entropy to provide cryptographic-strength random numbers at boot time, especially for virtual machines and systems without much entropy from disk interrupts (e.g. using SSDs). LRNG collects entropy faster at boot time using CPU execution-time jitter and other techniques. As described in his LRNG paper, this entropy collection complies with the SP 800-90B standard from the US National Institute of Standards and Technology (NIST). In addition, LRNG uses techniques to combine entropy sources in a fully documented, mathematical approach, rather than the informal mechanism in the current kernel RNG.
The first patch covers the gathering and handling of entropy by the LRNG framework. It provides a deterministic RNG (DRNG) that is compliant with NIST SP 800-90A, but it allows for other DRNG implementations to be used within the framework. But the fact that it is meant as a drop-in replacement for the existing RNG means that it replaces all of that code. The existing code works, with some known limitations, perhaps, but "starting over" with a new implementation has its own set of dangers.
Beyond that, several in the discussion were skeptical about the value of the NIST standards (also known as FIPS standards); Jason A. Donenfeld said:
You've posted it again, and yet I still believe this is not the correct design or direction. I do not think the explicit goal of extended configurability ("flexibility") or the explicit goal of being FIPS compatible represent good directions, and I think this introduces new problems rather than solving any existing ones. While there are ways the current RNG could or even should be improved -- or rewritten -- this approach is still not that, no matter how many times you post it.
But Müller noted that some distributions are carrying patches to comply with FIPS, which has led to fragmentation of the cryptographic RNG in the kernel. The idea behind LRNG is to try to ensure that whichever DRNG is chosen, the resulting random numbers are generated in a secure way. In addition, he said that he had not received complaints about the LRNG design, while he had incorporated lots of changes along the way as suggested by various Linux developers. The changelog for the patch set, which goes back to 2016, shows quite a few changes of that sort.
Distributions and FIPS
Greg Kroah-Hartman wanted to know
about distributions carrying patches for FIPS compliance and wondered:
"Why have the distros not submitted their changes
upstream?
" Simo Sorce, who is on the RHEL crypto team, answered: "We have not proposed them because they are hacks, we know they are
hacks, and we know they are not the long term solution.
" Red Hat
does need some way to have FIPS compliance in its products, he said.
But Kroah-Hartman said:
"Hacks that work today are the step toward a real solution.
"
He reiterated that evolution is what is needed to get FIPS compliance into
the kernel, rather than completely replacing the random-number
subsystem. "Work off of those known-working-and-certified hacks.
Submit them and
let's go from there please.
"
Similarly, John Haxby reported
that Oracle carries a patch
to enable a FIPS-compliant RNG at boot time or by writing to a sysfs file;
in FIPS mode, it always reseeds the DRNG from the jitter entropy. He said
that it is "not healthy
" for Oracle to carry out-of-tree
patches like this, but it was expedient. He would rather have something
upstream that is shared by everyone, but sees the patch as a temporary
workaround:
We're carrying this patch simply because the certification requirements changed and this was the quickest and easiest way to workaround today's problem. It won't fix tomorrow's problem and next time we, and others, attempt FIPS certification then we, and others, will need a different /dev/random because neither the old one nor our quick and dirty workaround will actually be acceptable.
Kroah-Hartman suggested that the
patch was also a good starting point: "Now that's a much smaller and
simpler and easier to understand change,
compared to 'rewrite the whole random number generator'.
" He said
that if those who need FIPS compliance worked together to get something
working into the mainline, that would likely be an easier path.
But adding a stand-alone separate random subsystem just for this is not a good idea and is one huge reason why this patch set keeps being ignored by the kernel developers.
Sorce also replied
to Donenfeld, noting that "FIPS is essential for us and any design must include an
option to be FIPS certifiable
"; Müller has been working with
distributions and standards organizations to gather and implement the requirements. In
Sorce's other message, he described some of that work:
These patches have not been maturing in a void, but Stephan basically distilled discussions between multiple vendors as well as regulatory bodies (as you can see he has reviews from BSI and NIST requirements are also fully represented here).He addressed a few aspects I can mention but are not the only ones: performance (esp on NUMA systems), not blocking at boot due to lack of entropy, NIST/BSI conformance, flexibility so that future regulatory requirements can be easily integrated and upstreamed.
More FIPS
Kroah-Hartman would rather see
the normal kernel development path followed here: "Remember,
evolution is the correct way of kernel development, not
intelligent design :)
". But, as Müller pointed
out, there was a patch
set posted for discussion in September 2020
to evolve the current RNG into one that was compliant, which never really
went anywhere. Kroah-Hartman asked: "That's a load of patches, some of them seem sane, what ever happened to
them?
" The answer, Müller said, is:
"Nothing was discussed, nothing was picked up.
"
Müller also said that LRNG does not actually replace the existing kernel RNG, it just provides a way for alternatives to be used:
One side note: the LRNG patch set does not replace random.c, but provides an additional implementation that can be selected at compile time. I am under the impression that is an equal approach considering other areas of the kernel like file systems, memory allocators, and similar.
While it may make sense to have multiple upstream implementations in some
areas, the kernel RNG is not one of those areas, Kroah-Hartman said. Beyond that,
the kernel RNG is used in multiple places in the kernel; "Odds are,
you REALLY do not want the in-kernel calls to be pulling from
the 'random-government-crippled-specification' implementation,
right?
" Sorce did
not agree:
When our customers are mandated to use FIPS certified cryptography, they want to use it for kernel cryptography as well, and in general they want to use a certified randomness source as well.
He understands the hesitancy to trust government agencies in light of problems like the Dual EC DRBG mess, but the NIST specifications are not mandating a particular algorithm; the requirements are meant to allow multiple different implementations. Furthermore:
The specification is quite thorough and provides well reasoned requirements as well as self-test that insure coding mistakes won't end up returning non-random values.
Maintainer questions
But there is another problem in following the evolution path to change the kernel RNG, Sorce said:
And the main question here is, how can we get there, in any case, if the maintainer of the random device doesn't even participate in discussions, does not pick obvious bug fixes and is simply not engaging at all?Your plan requires an active maintainer that guides these changes and interacts with the people proposing them to negotiate the best outcome. But that is not happening so that road seems blocked at the moment.
Ted Ts'o is the maintainer of the kernel RNG, but has been notably absent
in this and other discussions of changes and fixes for that subsystem.
Kroah-Hartman seemed
skeptical that bug fixes were not being picked up, but Eric
Biggers listed
several fixes and cleanups that had languished before eventually being
picked up by other maintainers (one of them by Kroah-Hartman, in fact).
Biggers concluded: "So unfortunately, as far as I can tell, Ted is
not maintaining random.c anymore.
"
Donenfeld said that he was willing to review fixes and improvements for the kernel RNG, but cautioned that he is concerned that the FIPS requirements may be overbroad:
And so it would seem that the goal of implementing the RNG as best as we can might potentially be at odds with the goal of getting that green compliance checkbox, because that checkbox oversteps its bounds a bit.[...] I would like the kernel to have an excellent CSPRNG [cryptographically-secure pseudorandom number generator], from a cryptographic point of view, from a performance point of view, from an API point of view. I think these motivations are consistent with how the kernel is generally developed. And I think front loading the motivations with an external compliance goal greatly deviates and even detracts from the way the kernel is generally developed.
[...] Specifically, I think that if you change your perspective from, "how can we change the algorithms of the RNG to be FIPS" to "how can we bend FIPS within its limits so that having what customers want would minimally impact the quality of the RNG implementation or introduce undue maintenance burdens." This means: not refactoring the RNG into some large abstraction layer that's pluggable and supports multiple different implementations, not rewriting the world in a massive patchset, not adding clutter. Instead, perhaps there's a very, very minimal set of things that can be done that would be considerably less controversial.
Sorce was amenable to that approach, and Haxby said that he would submit the Oracle patch as a possible path forward. What we are seeing, at least in part, is a new maintainer volunteering to help out with the kernel RNG, which Ts'o supports. If a more evolutionary approach, with reasoning beyond just "because FIPS", is proposed, it would seem that the kernel RNG may be able to check the compliance box without the upheaval that a full-on replacement could bring. Another possibility was raised by Sandy Harris; the FIPS requirements might be met with the existing RNG, but there are hurdles there as well:
[...] in fact their DRNG design requires an external source of random bits. However, it requires that the source be certified & that would be a problem for us. Intel & others might be able to get their random number instructions certified and vendors of crypto or SOC chips might get theirs certified, but the kernel community could not do that.I think the kernel's entropy collection routines are good enough that they could, in principle, be certified, but that would involve some work & considerable money.
It seems clear that some solution is needed, at least for the enterprise distributions. Müller's patches provide a mechanism that is FIPS-compliant and apparently has minimal impact in terms of performance—perhaps even better performance than the existing implementation—while solving a number of other problems. The techniques used could form a basis for a relatively small number of changes that might benefit all users of the kernel RNG. The patch set as it stands now is not going to fly, but, with luck and some perseverance, the FIPS requirements could be met by following the usual kernel-development strategy. Only time will tell.
Index entries for this article | |
---|---|
Kernel | Development model |
Kernel | Random numbers |
Security | Linux kernel/Random number generation |
Security | Random number generation |
Posted Dec 7, 2021 20:41 UTC (Tue)
by abatters (✭ supporter ✭, #6932)
[Link] (3 responses)
https://arstechnica.com/gadgets/2019/10/how-a-months-old-...
If RDRAND could be that severely broken in a major CPU, it makes you wonder how many other ways it could be more subtly broken in other CPUs, certified or not...
Posted Dec 8, 2021 2:57 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (2 responses)
Virtual machines are also problematic as the only hardware randomness source you have, if any, is from the main CPU. Linux does support virtio-rng for injecting randomness into the VM environment, which ideally should be a seed derived from multiple local hardware sources. However, hypervisors like Firecracker refuse to implement the virtio-rng driver. If you don't trust RDRAND (e.g. because it's a single source and you believe hardware faults might be more common than commonly believed), you probably don't want to use AWS Lambda or similar tech. At least, not where you're doing anything that might rely on a strong source of entropy. (I don't think the Nitro hypervisor does, either, but that's a more complex story. The issue is especially egregious for something like Lambda because of the relationship of the work to VM lifetimes--no time to accumulate randomness the dumb way through hardware jitter.)
Posted Dec 8, 2021 5:11 UTC (Wed)
by qyliss (subscriber, #131684)
[Link]
I think Firecracker is the odd one out here — QEMU, cloud-hypervisor, and crosvm all implement virtio-rng. I read https://github.com/firecracker-microvm/firecracker/issues... and https://github.com/firecracker-microvm/firecracker/issues... and it doesn't look like they /refuse/ to implement it, just that they haven't implemented it so far?
Posted Dec 9, 2021 7:21 UTC (Thu)
by tlamp (subscriber, #108540)
[Link]
Proxmox VE does: https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_virti...
Posted Dec 8, 2021 2:47 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (8 responses)
Posted Dec 8, 2021 3:26 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (7 responses)
Posted Dec 8, 2021 6:29 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (5 responses)
* In this case, my understanding is that FIPS is, at the very least, not completely terrible.
Posted Dec 8, 2021 9:04 UTC (Wed)
by taladar (subscriber, #68407)
[Link] (4 responses)
Posted Dec 8, 2021 9:18 UTC (Wed)
by rahulsundaram (subscriber, #21946)
[Link] (2 responses)
That seems to be because of time constraints (long term upstreamable solutions usually take more time) rather than anything inherent in the specification.
Posted Dec 8, 2021 23:32 UTC (Wed)
by rgmoore (✭ supporter ✭, #75)
[Link] (1 responses)
My reading of the "hacks" comment was similar. The distributions that care about FIPS compliance have put something together that they think meets the requirement, but they have been reluctant to try to upstream it because they think it's a quick and dirty solution and they're hoping for a better one upstream. That said, I'm sympathetic to the upstream developer saying that they should at least send their hack upstream as a way of getting things started. Maybe it's ugly and nasty, but the way to solve that is to expose it and get everyone working on turning it into something nice, not to keep hiding it.
Posted Dec 9, 2021 11:44 UTC (Thu)
by rahulsundaram (subscriber, #21946)
[Link]
As someone who wrote things like https://docs.fedoraproject.org/en-US/package-maintainers/... I certainly agree. Just building awareness of the distro hacks can be helpful to upstream developers even if they will never accept your patch as is.
Posted Dec 11, 2021 20:51 UTC (Sat)
by k8to (guest, #15413)
[Link]
The reality of how this has played out is fairly poor. The validation process is so slow and burdensome that FIPS compliance involves switching to various less well tested codepaths with less maintenance.
This isn't really solvable in relatively fast moving projects, so it feels like a failure for open source components. In slower moving spaces like areospace or medicine the requirements might be useful, I certainly don't know.
Posted Dec 8, 2021 13:54 UTC (Wed)
by Kluge (subscriber, #2881)
[Link]
The maintainer's attitude towards FIPS requirements seems like a side issue.
RDRAND
RDRAND
virtio-rng support in VMMs
RDRAND
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel
FIPS-compliant random numbers for the kernel