An API for user-space access to kernel cryptography

By Jake Edge
August 25, 2010

Adding an interface for user space to be able to access the kernel crypto subsystem—along with any hardware acceleration available—seems like a reasonable idea at first blush. But adding a huge chunk of formerly user-space code to the kernel to implement additional cryptographic algorithms, including public key cryptosystems, is likely to be difficult to sell. Coupling that with an ioctl()-based API, with pointers and variable length data, raises the barrier further still. Still, there are some good arguments for providing some kind of user-space interface to the crypto subsystem, even if the current proposal doesn't pass muster.

Miloslav Trmač posted an RFC patchset that implements the /dev/crypto user-space interface. The code is derived from cryptodev-linux, but the new implementation was largely developed by Nikos Mavrogiannopoulos. The patchset is rather large, mostly because of the inclusion of two user-space libraries for handling multi-precision integers (LibTomMath) and additional cryptographic algorithms (LibTomCrypt); some 20,000 lines of code in all. That is the current implementation, though there is mention of switching to something based on Libgcrypt, which is believed to be more scrutinized as well as more actively maintained, but is not particularly small either.

One of the key benefits of the new API is that keys can be handled completely within the kernel, allowing user space to do whatever encryption or decryption it needs without ever exposing the key to the application. That means that application vulnerabilities would be unable to expose any keys. The keys can also be wrapped by the kernel so that the application can receive an encrypted blob that it can store persistently to be loaded back into the kernel after a reboot.

Ted Ts'o questioned the whole idea behind the interface, specifically whether hardware acceleration would really speed things up:

more often than not, by the time you take into account the time to move the crypto context as well as the data into kernel space and back out, and after you take into account price/performance, most hardware crypto [accelerators] have marginal performance benefits; in fact, more often than not, it's a lose.

He was also concerned that the key handling was redundant: "If the goal is access to hardware-escrowed keys, don't we have the TPM [Trusted Platform Module] interface for that already?" But Mavrogiannopoulos noted that embedded systems are one target for this work, "where the hardware version of AES might be 100 times faster than the software". He also said that the TPM interface was not flexible enough and that one goal of the new API is that "it can be wrapped by a PKCS #11 [Public-Key Cryptography Standard for cryptographic tokens like keys] module and used transparently by other crypto libraries (openssl/nss/gnutls)", which the TPM interface is unable to support.

There is already support in the kernel for key management, so Kyle Moffett would like to see that used: "We already have one very nice key/keyring API in the kernel (see Documentation/keys.txt) that's being used for crypto keys for NFSv4, AFS, etc. Can't you just add a bunch of cryptoapi key types to that API instead?" Mavrogiannopoulos thinks that because the keyring API allows exporting keys to user space—something that the /dev/crypto API explicitly prevents—it would be inappropriate. Keyring developer David Howells suggests an easy way around that particular problem: "Don't provide a read() key type operation, then".

But the interface itself also drew complaints. To use /dev/crypto, an application needs to open() the device, then start issuing ioctl() calls. Each ioctl() operation (which are named NCRIO_*) has its own structure type that gets passed as the data parameter to ioctl():

    res = ioctl(fd, NCRIO_..., &data);

Many of the structures contain pointers for user data (input and output), which are declared as void pointers. That necessitates using the compat_ioctl to handle 32 vs. 64-bit pointer issues, which Arnd Bergmann disagrees with: "New drivers should be written to *avoid* compat_ioctl calls, using only very simple fixed-length data structures as ioctl commands.". He doesn't think that pointers should be used in the interface at all if possible: "Ideally, you would use ioctl to control the device while you use read and write to pass actual bits of data".

Beyond that, the interface also mixes in netlink-style variable length attributes to support things like algorithm choice, initialization vector, key type (secret, private, public), key wrapping algorithm, and many additional attributes that are algorithm-specific like key length or RSA and DSA-specific values. Each of these can be tacked on as an array of (struct nlattr, attribute data) pairs using the same formatting as netlink messages, to the end of the operation-specific structure for most, but not all, of the operations. It is, in short, a complex interface that is reasonably well-documented in the first patch of the series.

Bergmann and others are also concerned about the inclusion of all of the extra code, as well:

However, the more [significant] problem is the amount of code added to a security module. 20000 lines of code that is essentially a user-level library moved into kernel space can open up so many possible holes that you end up with a less secure (and slower) setup in the end than just doing everything in user space.

Mavrogiannopoulos thinks that the "benefits outweigh the risks" of adding the extra code, likening it to the existing encryption and compression facilities in the kernel. The difference, as Bergmann points out, is that the kernel actually uses those facilities itself, so they must be in the kernel. The additional code being added here is strictly to support user space.

In the patchset introduction, Trmač lists a number of arguments for adding more algorithms to the kernel and providing a user-space API, most of which boil down to various government specifications that require a separation between the crypto provider and user. The intent is to keep the key material separate from the—presumably more vulnerable—user-space programs, but there are other ways to do that, including have a root daemon that offers the needed functionality as noted in the introduction. There is a worry that the overhead of doing it that way would be too high: "this would be slow due to context switches, scheduler mismatching and all the IPC overhead". However, no numbers have yet been offered to show how much overhead is added.

There are a number of interesting capabilities embodied in the API, in particular for handling keys. A master AES key can be set for the subsystem by a suitably privileged program which will then be used to encrypt and wrap keys before they are handed off to user space. None of the key handling is persistent across reboots, so user space will have to store any keys that get generated for it. Using the master key allows that, without giving user space access to anything other than an encrypted blob.

All of the expected operations are available through the interface: encrypt, decrypt, sign, and verify. Each is accessible from a session that gets initiated by an NCRIO_SESSION_INIT ioctl(), followed by zero or more NCRIO_SESSION_UPDATE calls, and ending with a NCRIO_SESSION_FINAL. For one-shot operations, there is also a NCRIO_SESSION_ONCE call that handles all three of those operations in one call.

While it seems to be a well thought-out interface, with room for expansion to handle unforeseen algorithms with different requirements, it's also very complex. Other than the separation of keys and faster encryption for embedded devices, it doesn't offer that much for desktop or server users, and it adds an immense amount of code and the associated maintenance burden. In its current form, it's hard to see /dev/crypto making its way into the mainline, but some of the ideas it implements might—particularly if they are better integrated with existing kernel facilities like the keyring.

Index entries for this article
Kernel	Cryptography

An API for user-space access to kernel cryptography

Posted Aug 26, 2010 5:44 UTC (Thu) by ringerc (subscriber, #3071) [Link] (5 responses)

"and after you take into account price/performance, most hardware crypto [accelerators] have marginal performance benefits; in fact, more often than not, it's a lose."

I'd mostly agree with that, personally - but there's a very big "except".

The Via C3, Via C7, and newer Intel Xeon processors have hardware crypto acceleration on-CPU for a variety of algorithms. My 400MHz C3 thin clients at work dramatically outperform the 2.2GHz Intel Core 2 Duo machines on many crypto tasks.

Personally I don't care about isolating the keys from the apps using them for my use cases - if I did, I'd be using the TPM, or smart cards. I can see how it'd potentially be useful, but I'm not convinced that keeping the keys in kernel memory is much better than keeping them in a separate user-space process.

(For that matter, even dedicated key-isolation crypto hardware like smartcards have proven vulnerable to power- and timing- attacks. No key isolation is perfect - the question is whether moving it into the kernel is better _enough_ to justify the time/effort/complexity cost).

An API for user-space access to kernel cryptography

Posted Aug 26, 2010 10:16 UTC (Thu) by alankila (guest, #47141) [Link] (4 responses)

Wouldn't these facilities be accessed just by executing some instructions made for this purpose, and therefore they are available to any program?

An API for user-space access to kernel cryptography

Posted Aug 27, 2010 7:51 UTC (Fri) by cladisch (✭ supporter ✭, #50193) [Link] (3 responses)

There are hardware accelerators that are not integrated into the CPU execution units but that exist as separate devices.

An API for user-space access to kernel cryptography

Posted Aug 28, 2010 9:25 UTC (Sat) by jengelh (subscriber, #33263) [Link] (1 responses)

Perhaps it's better to do the crypto stuff like libusb, that is, have the kernel export only the data channel and have an userspace library (perhaps even daemon if there is concurrency to deal with) to do the crypto context setup etc.

An API for user-space access to kernel cryptography

Posted Aug 28, 2010 15:27 UTC (Sat) by kleptog (subscriber, #1183) [Link]

Oh, I was thinking that the logical step would be cryptfs, where you mount a file system and get a bunch of directories representing encryption algorithms and you just open() the one you want and then use send/recvmsg with options to do the work you want.

No seriously, I don't understand why this needs to be in the kernel, a root-owned daemon should be more than enough.

An API for user-space access to kernel cryptography

Posted Aug 30, 2010 10:19 UTC (Mon) by mjthayer (guest, #39183) [Link]

> There are hardware accelerators that are not integrated into the CPU execution units but that exist as separate devices.

To me it would seem reasonable to have a crypto API in the kernel with no software fallback, so that it is available if it makes sense to do it in hardware, but the interface user has to handle the fallback themselves if the hardware isn't there.

An API for user-space access to kernel cryptography

Posted Aug 30, 2010 10:33 UTC (Mon) by dd9jn (✭ supporter ✭, #4459) [Link]

There is another reason why some people think that pushing crypto into the kernel is the right thing: It makes it easier to get a FIPS 140 validation.

One important topic there is to separate the code to handle the keys from the application. Using a validated library is not a real option because a process using a validated library still has full access to the code and data of that library. Thus it seems to them easier to define the crypto module boundary as the kernel mode - code and data living there is natuarally protected from any user process.

From a software engineering POV and also for easier auditing it would be far better to move the crypto code into a daemon which may then be validated. This will give a very nice and enforceable crypto module boundary. The claimed drawback is a performace penalty.

Of course this does not solve the general problem that the kernel has access to everything. But it offers a better migration path to some future (capability based) microkernel.

An API for user-space access to kernel cryptography

Posted Sep 2, 2010 5:47 UTC (Thu) by kevinm (guest, #69913) [Link]

If the claimed performance problems with the daemon solution eventuate, then resurrect a port of Solaris's "doors" mechanism and use that. It still belongs in user space.