CloudABI

February 10, 2016

This article was contributed by Neil Brown

linux.conf.au 2016

As he described in his presentation [WebM] at the 2016 linux.conf.au, Ed Schouten has a vision to create a new niche in the open-software world: one for software that cannot be trusted but can be run safely. This approach is based on a new application binary interface (ABI) for which he is building support. The ABI is called "CloudABI", though the range of planned use cases has rather outgrown the "Cloud" focus.

CloudABI came out of a dissatisfaction with common approaches to security, but also with the usual approaches to reusability and testability in Unix applications.

Disappointed with security focus

Schouten observed that untrusted code can be run natively, which isn't particularly safe, or in a virtual machine, which can be costly in terms of performance and complexity. In between these two extremes are various mechanisms to impose access controls on the code while it is running.

The particular example he highlighted was AppArmor, which can impose fine-grained controls. The problem with this approach is that the access control is "stapled on later", so the burden of creating a secure policy rests on distributors, system administrators, or users — not on the developers. Consequently, AppArmor policies break over time and it is not uncommon to just turn the access controls off because they are too hard to get right.

This attitude echoed an opinion expressed by Casey Schaufler earlier in the week where, during the Kernel Miniconf, he explained his reason for disliking SELinux. The complexity of SELinux encourages policies to be auto-written from audit traces, rather than developed from an understanding of the important issues.

The access-control approach seems to lead to a "tweak it until it works" mentality rather than deliberate, meaningful, justifiable access restrictions. Unlike these systems, Schouten's approach doesn't offer any hope for existing software, but instead aims to create a context that stimulates developers to create code where the security implications are more obvious. That way, those implications can be managed intentionally.

Disappointed with reusability and testability

Given the fact that Schouten was talking about Unix more than Linux (he is a self-proclaimed fan of FreeBSD) and that reusable components are often touted as part of the "Unix Philosophy", it seemed strange for him to suggest that Unix doesn't encourage building reusable and testable tools. He does, however, have a point.

While reflecting on the particular example Schouten gave (a web server) and how it contrasted with with the more reusable parts of Unix, it became clear that reusability in Unix is largely limited to "filters", that take a single input (typically lines of text) and produce an output (probably text again), and that can be combined into pipelines. This limit is really imposed by the shell, which cannot pass anything more interesting than these two file descriptors and a few textual arguments.

Applications that need more than a single input and a single output (or maybe two with stderr) need to actively find them rather than passively receive them. This can require hard-coded pathnames (like /etc/mimetypes or /etc/resolv.conf) and, even when pathnames are passed via the command line or a configuration file, the application will normally be restricted to files in the filesystem so the end of a pipe or a network connection are hard to use. Some of these limitations can be overcome by paths in /dev/fd and extensive use of environment variables to override built-in paths, but Unix does not seem to encourage this.

All of this means, as Schouten lamented, that you often cannot use an application beyond the way the developer envisioned (as you often can with filters and pipes), and it is difficult to control the complete environment for testing.

Restricting Capabilities

These ideas of security, reusability, and testability are brought together by focusing on capabilities. Not the per-process capability bits such as CAP_SYS_ADMIN that Linux supports, but the capabilities that Unix has had since the beginning: file descriptors. If a process holds a file descriptor, then it is capable of doing various things to the file (or other object) attached.

While we may not always think of a file descriptor as a capability, it has always been able to serve that function, though not always to the extent that is possible today. Passing a file to a setuid program that is not setuid to "root" is best done by passing the file descriptor as a capability. The setuid program will then be able to read a file which it could not open itself. With CLONE_FD we are a step closer to using file descriptors as capabilities over processes too.

If a process is restricted from creating new file descriptors unless it already has an authorizing file descriptor, then many possible security concerns immediately disappear: much of the filesystem is inaccessible and new connections to the network cannot be established. Here, an authorizing file descriptor is something like a descriptor for a directory where a simple name (no / or ..) can be opened using openat(), or a listening socket on which accept() is permitted, but little else.

Further, if processes are so restricted, then all the resources they need will have to be provided at startup. This would be unworkable if we hoped to run these processes from a regular Unix shell, but Schouten presented the idea of a wrapper program (currently named cloudabi-run). It is given a untrusted CloudABI program and a configuration file (written in YAML). cloudabi-run identifies all the resources from the configuration file, modifies the file to contain file descriptor numbers in place of file names or Internet addresses, and makes both the file descriptors and the updated configuration available to the program. With the certainty that all resources can be controlled at startup, it becomes much easier to test the application in a variety of controlled contexts, and there is greater possibility of creating interesting reuse cases.

Capsicum and CloudABI

Restricting capabilities like this is already possible. In FreeBSD there is "Capsicum". In Linux there is seccomp/BPF in the mainline or Capsicum available as out-of-tree patches. Unfortunately, just using Capsicum isn't always easy.

Capsicum and seccomp perform their task by making certain system calls fail. This ensures that security is preserved, but it also can lead to error paths being followed in code where those paths have not been tested. Running untested code is not the best way toward security and reliability.

A particular example Schouten cited was of a cryptography library that would try to open /dev/random; on failure it would fall back to generating random numbers from a simple arithmetic combination of process ID and time of day. Consequently, an attempt to tighten security by enabling Capsicum could silently and seriously reduce the quality of the random numbers used.

This is where CloudABI comes in. It is an alternate ABI to compile code against, an ABI that matches POSIX for all of the functionality that Capsicum would permit, but excludes all functionality that Capsicum would reject. So there is openat() but no open(), and there is accept() for networking connections but no socket(), bind(), or connect(). When you successfully compile code against CloudABI using the provided toolchain, you can be reasonably sure that the code will run under Capsicum and not get the ENOTCAPABLE that it returns for disallowed requests.

Capsicum restrictions are not normally enforced until the cap_enter() call. Schouten wants to go beyond just making it easy for programs to choose to use Capsicum; he wants to provide for programs that are forced to use Capsicum. Programs built against the CloudABI are annotated in the ELF file in such a way that the kernel will enforce the expected restrictions. This goes beyond automatically calling cap_enter() by also limiting the process to a specific set of 58 system calls that make up the ABI. The hope is that this ABI can be identical across multiple kernels (Linux, FreeBSD, NetBSD) so that these untrusted binaries can be run equally well on all supported operating systems.

This means that if you want to use some library in your application and aren't sure what resources it needs, you just need to compile it against CloudABI and fix all the errors. This may well be a non-trivial amount of work, but once it is done you can know the library will never use unexpected resources. And maybe someone else will have done it already. Schouten listed Boost, cURL, GLib, LibreSSL, and Lua as already being available in CloudABI versions. These versions naturally have a restricted set of interfaces — only those that are consistent with the CloudABI model. He has been working on Python too, but it is a big job.

A key part of CloudABI is the compiler and toolchain support. CloudABI does not provide any new sort of security — Capsicum and seccomp are already available and provide the same kind of restrictions. Rather, CloudABI is intended to encourage the writing of secure, testable, reusable code. If you don't, the compiler will complain and tell you to try again.

Use cases

CloudABI is not for every application. The fact that you cannot initiate network connections or write to arbitrary files means that there are lots of places where it would not be the correct choice. But there are plenty of situations that do fit this niche — where the risk of attack is high, and the trust that can be placed in the code is low. These are situations where the disciplines enforced by CloudABI can make it a sensible choice.

Some examples suggested by Schouten were a spam filter, a web server, and a high-level cluster manager. The first two are appropriate choices as they are exposed to external attack. The last deserves a little explanation as it highlights another benefit produced by following the discipline of CloudABI and specifically the cloudabi-run approach to providing resources.

A cluster manager can benefit from knowing exactly what resources each application needs so that it can make sure they are available before starting the application and can make sure they stay available. With cloudabi-run, the configuration file is guaranteed to list all the resources that will be needed (the application cannot possibly access any others). This ensures that nothing slips through the cracks.

Another compelling use case is an "app engine" where customers can upload code to be run. Some memory and resource limits would need to be imposed, but otherwise the CloudABI program could be run on "bare metal" without virtualization or containerization. The ABI that is enforced by the kernel is all that is needed. If Schouten's goal of providing an identical ABI on multiple kernels is realized, app engine providers could provide this same service using whichever operating system they find most suitable. This may almost mean a "write once, run anywhere" experience.

Status

As always, there is much to do. This ABI is already available in the FreeBSD development code. Code for NetBSD and Linux is available, but not so well advanced; it can be found here. Schouten (ed@nuxi.nl) would like to be motivated to continue work by having interest shown.

Index entries for this article
GuestArticles	Brown, Neil
Conference	linux.conf.au/2016

CloudABI

Posted Feb 11, 2016 11:38 UTC (Thu) by drysdale (guest, #95971) [Link]

> Capsicum and seccomp perform their task by making certain system calls fail
> ...
> Capsicum restrictions are not normally enforced until the cap_enter() call.

One clarification -- turning off a collection of system calls is only one part of Capsicum (the part called 'capability mode', which happens on cap_enter()).

> If a process holds a file descriptor, then it is capable of doing various things to the file (or other object) attached.

The other major part of Capsicum is allowing fine-grained controls (called 'rights') for exactly what things can be done with an individual file descriptor -- and these rights are policed even before cap_enter(). This allows a capability file descriptor to be tightly restricted (e.g. to be made truly read-only), making it much safer to then pass that descriptor to another process.

The two parts work in combination because capability mode prevents an attacker minting new file descriptors to get around the rights restrictions applied to any existing capability file descriptors.

Thanks for the article!

CloudABI

Posted Feb 12, 2016 0:11 UTC (Fri) by dlang (guest, #313) [Link]

re: SELinux vs AppArmor

this hits on my grief with SELinux, it's a system-wide config that must include everything, and as such is too complex for anyone to understand. The policies have to be open to allow a wide range of 'typical' uses, and locking them down becomes very hard.

With AppArmor, you can focus on just one app at a time, and changing permissions for one app doesn't cascade to all other apps (yes, I am aware that this can let you open unexpected side-channels, but it's worth it to be able to narrow the scope)

This makes the AppArmor configs simple enough that it's within the realm of possibility for normal sysadmins to adjust them.

CloudABI

Posted Feb 12, 2016 19:38 UTC (Fri) by jhoblitt (subscriber, #77733) [Link] (1 responses)

How are different system call numbers and semantics handled across platforms? Is there some sort of shim layer or is this just a hypothetical feature?

CloudABI

Posted Feb 13, 2016 3:00 UTC (Sat) by neilbrown (subscriber, #359) [Link]

I believe it is a real implemented feature.

On an x86_64 Linux kernel you can run x86_32 binaries which have a completely different set of system numbers and somewhat different semantics. I presume a similar arrangement can cause syscalls to be routed to a "CloudABI" set of systemcalls.
For the most part the sematics that Capsicum allows will be fairly uniform thanks to Posix. I haven't looked into the specifics of the code to see how many "Quirks" need handling.