A Rust implementation of Android's Binder
Llamas began the talk by describing Binder as the primary interprocess communication channel for the Android system. It has a long history, having originated in BeOS and been used in Palm OS before being adopted by Android. The Binder driver lives in the kernel; most applications access it by way of the libbinder library. The Binder service implements a thread pool that is able to take requests from clients in parallel; a memory-mapped region is used for transactions between server and client processes.
There are a number of Binder features that make it attractive for Android. It performs no buffering at all in the kernel. There is an extensive set of operations, all accessed via ioctl(), including a combined read/write command. Binder implements priority inheritance and the ability to share file descriptors; it has a number of remote object management features and a strong reference-counting implementation.
Given Binder's long history, why would one want to rewrite it now? The module, Llamas said, features a high level of complexity and a large amount of technical debt; that combination has proved to be a recipe for security problems. Analysis has shown that Binder contains about 3.1 vulnerabilities per thousand lines of code, which is a relatively high number — and it is not getting better. Exploits are known for about half of the known vulnerabilities. Given that even the lowest-privilege sandboxes in Android have access to Binder, this security record is troubling indeed.
There is a lot that can be improved in Binder, Llamas said. Functions nearly 1,000 lines long (example) are generally a bad sign. The error handling in Binder is "problematic" and the source of a lot of bugs. But any improvements to this complex body of code tend to be risky, leading to a lot of "fix to the fix to the fix" patches. The Android developers are tired of chasing bugs like these and are looking for a better way.
The new Binder
Ryhl then took over to talk about the new Binder implementation, which was posted to the mailing lists at the beginning of November. One of the big advantages of working in Rust, Ryhl said, is that it makes object ownership visible in the type system. For example, C code working with reference-counted objects has to check for a non-zero reference count at run time, but Rust code simply will not compile if it lacks an object reference where one is needed. In C, one can look at a structure full of function pointers and get no sense for which of those manipulate reference counts; in Rust, that can be made clear by using different types.
Rust also makes it impossible to forget to clean up in error paths. The slides for the talk pointed to some error-handling code in the C implementation that is implemented as a long series of goto targets at the end of a function; the Rust equivalent was simply the "}" that ends the function. Rust also actively prevents almost all of the vulnerabilities that have been found in Binder; over half of them were use-after-free bugs, which "never happen in Rust". Other types of vulnerabilities, such as out-of-bounds array accesses, will be caught at run time, turning a potential compromise into a denial-of-service problem — a much less severe situation.
Ryhl has thus duly rewritten Binder in Rust. The result implements all of the features provided by the C implementation and passes the entire Binder test suite; it is able to boot and run an Android device. Ryhl brandished a phone that was running the Rust Binder implementation as proof, to applause. The performance is as good as the C version; it "took a bit of optimization" to reach that point, but the C version has also seen a fair amount of optimization work.
Rust, she said, can be rather more verbose than C; there are a lot of invariants that have to be expressed in the code. But that is countered by the need for far less error-handling code; it turns out to be a wash, with the size of the two implementations being about the same.
With regard to unsafe code, she said that there will always be a need for some unsafe code in the kernel, but that the Binder implementation needed little of it, mostly to interface with the (still in C) binderfs implementation. The abstractions needed to interface with the rest of the kernel can involve unsafe code in general; the workqueue abstractions (which Ryhl upstreamed for 6.7) have a fair amount of it. But that is a single module that will be shared across all drivers using workqueues; it only has to be gotten right once.
Writing in Rust does not guarantee the end of vulnerabilities, she said at the conclusion of the talk; one was discovered in the Rust implementation during the conference. It turned out that the C implementation contains a similar bug. In C, that bug can be exploited to cause a use-after-free vulnerability, which is "game over". In Rust, there is no use-after-free, but the result could be a confusion of the mappings used to map messages, which is still a severe vulnerability, though rather harder to exploit. Using Rust, she concluded, prevented the memory-safety problem, but logic errors remain.
In the questions after the talk, Julia Lawall asked about the kinds of optimizations that were needed for Rust code. A source of bottlenecks, Ryhl said, is calling into C to access tiny functions. Performance can be improved by rewriting those functions in Rust, but that leads to duplicated code and the associated maintainability problems.
There were a couple of questions about abstractions and their relationship to the C code. One attendee asked if, during the process of implementing Binder, changes in the kernel's C code had forced changes to the abstractions. The answer was that there was one change in process freezing that has caused some problems. I then asked about the workqueue abstractions; if somebody makes an incompatible change to the (C) workqueue implementation, who is responsible for making the abstractions work again? The answer to that is not entirely clear; eventually, it seems, the kernel's usual "you broke it, you fix it" rule will probably apply, but it is expected that many developers will lack the Rust skills needed to update those abstractions for some time and will thus need help from the Rust-for-Linux developers.
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting our
travel to this event.]
Index entries for this article | |
---|---|
Kernel | Android |
Kernel | Development tools/Rust |
Conference | Linux Plumbers Conference/2023 |
Posted Nov 30, 2023 16:56 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link] (12 responses)
Posted Nov 30, 2023 18:26 UTC (Thu)
by stop50 (subscriber, #154894)
[Link] (1 responses)
Posted Nov 30, 2023 20:56 UTC (Thu)
by gdamjan (subscriber, #33634)
[Link]
But binder is actually not out-of-tree.
Posted Nov 30, 2023 18:55 UTC (Thu)
by ringerc (subscriber, #3071)
[Link] (6 responses)
Posted Nov 30, 2023 19:04 UTC (Thu)
by quotemstr (subscriber, #45331)
[Link] (4 responses)
Posted Nov 30, 2023 20:23 UTC (Thu)
by tao (subscriber, #17563)
[Link] (3 responses)
Posted Nov 30, 2023 23:41 UTC (Thu)
by ebassi (subscriber, #54855)
[Link] (1 responses)
The main advantage of D-Bus is, unsurprisingly, the shared bus, so you get multicast; the other advantage is the service activation—though that got kind of superseded by systemd though a similar mechanism: talk to a well-known name on the bus, the service that provides the name gets activated if it's not already running. For that, you need a central broker.
Posted Dec 1, 2023 11:51 UTC (Fri)
by smcv (subscriber, #53363)
[Link]
Several D-Bus implementations now use systemd as an optional or required backend for service activation in preference to doing the fork-and-exec themselves, but it's still the message bus (the broker) that needs to be responsible for the autostarting behaviour where it recognises a message addressed to a not-yet-started service, asks for the service to be started, delays delivery of the message until that has happened, and avoids sending an error reply "service is not running" in the meantime.
systemd would not be able to do this without the message bus's help, because it's the message bus that needs to implement the delayed re-delivery: it wouldn't work without being integrated into the more general logic for how and when to deliver messages.
Posted Dec 1, 2023 12:19 UTC (Fri)
by smcv (subscriber, #53363)
[Link]
Many of the conveniences that are reasons to use D-Bus are really message bus features - distributed naming, broadcast/multicast signals, authentication, service activation with auto-starting, imposing a total order on messages to make them easier to reason about - and bypassing the message bus means reduced access to those conveniences. It's up to an individual application/interface author whether that's a worthwhile tradeoff or not.
The other big reason to use D-Bus is not needing significant incremental work to add more communication with some other component that already uses D-Bus (network effect): in the freedesktop.org ecosystem, if components can agree on using D-Bus for multiple forms of communication that don't have particularly demanding latency or throughput requirements, then that avoids reinventing too many wheels for those forms of communication, allowing the time saved to be re-invested in places where it matters more. Again, that's really more of a message bus feature than a protocol feature. I assume that in Android, there's a similar pressure to use Binder as a "default" IPC mechanism in any situation where there's no particularly good reason to do otherwise.
For situations where the message bus would be a problem, one way to bypass it is that the D-Bus wire protocol can be used as message framing for a simple client/server protocol over AF_UNIX or even TCP, with clients talking directly to the server. systemd does this, over AF_UNIX (the server listens on /run/systemd/private), so that its components can communicate during early boot before the message bus is available. This lets services reuse the D-Bus message framing and type system, which is particularly useful if they already needed that code for some other reason (for example systemd provides more normal D-Bus services over the message bus after it becomes available, so it would have needed access to a D-Bus implementation anyway).
Another way to bypass the message bus without abandoning D-Bus completely is that two peers initially communicating via the message bus can use it to negotiate a side channel, either via fd-passing (using SCM_RIGHTS to attach file descriptors to a D-Bus message) or by sharing the address of a socket in the filesystem, and then do their latency- or throughput-sensitive communication through that side channel, either reusing the D-Bus message framing and type system or switching to some other protocol that is more optimal for what this particular application wants. They can either move their entire communication into the side channel, or continue to use the message bus for "control" messages that are less sensitive to latency/throughput.
Posted Nov 30, 2023 19:25 UTC (Thu)
by nickodell (subscriber, #125165)
[Link]
Posted Nov 30, 2023 23:38 UTC (Thu)
by mcon147 (subscriber, #56569)
[Link] (2 responses)
Posted Dec 1, 2023 8:25 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
Cheers,
Posted Dec 7, 2023 10:59 UTC (Thu)
by maxfragg (subscriber, #122266)
[Link]
Posted Nov 30, 2023 16:58 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (7 responses)
I don't see anything about how long this look and whether it was particularly difficult. I assume the original C implementation has man-years of work in it, to get similar performance already is very nice indeed.
Posted Nov 30, 2023 20:36 UTC (Thu)
by ballombe (subscriber, #9523)
[Link] (6 responses)
I have done a lot of language conversion in HPC. This is way easier than it sound. It is several order of magnitude easier than rewriting from scratch.
Posted Dec 1, 2023 14:52 UTC (Fri)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
There are some fundamental algorithmic building blocks which a good C programmer might know in principle, but in C you need to implement them, in Rust they're just method calls. For example [T].select_nth_unstable_by you can see how to write that in C but it's a bunch of work any time you need it, would you do that work? Perhaps only if a simpler but worse performing solution wouldn't pass review or in a later optimisation cycle. Of course you do still need to recognise that select_nth_unstable_by is actually solving the problem you had, this is not trivial, there's a well known C++ meme which falls out of a Sean Parent talk, "That's a rotate" because more often than you'd expect the complicated ad hoc shuffling of data in a contiguous structure which you wrote by hand turns out to actually be equivalent to just... rotate (in Rust [T].rotate_left or [T].rotate_right)
Posted Dec 1, 2023 19:02 UTC (Fri)
by ballombe (subscriber, #9523)
[Link] (4 responses)
Posted Dec 2, 2023 9:45 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
Posted Dec 7, 2023 0:45 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (1 responses)
This basic pattern is true of many niches where C remains common. The absence of language environment support for many of these algorithms is not felt as keenly as it would be were they missing from other languages such as Rust. Much of the work of a kernel is fundamentally about juggling shared, global resources, which necessarily involve gnarly reference graphs to which a language like Rust is fundamentally hostile, the design of which is largely predicated on pass-by-value and tidy ownership rules which track the call stack. Many kernel modules would be better off written in Rust, but also would ideally not run in kernel space at all.
Posted Dec 9, 2023 19:15 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link]
I was skimming an old (like 30+ years old) Fortran book last week which gives this problem as a reason why its authors don't like Quick Sort although they admit it's technically faster on average. Today of course that book's advice is obsolete, you should use (and later Rust's authors did here) an introsort or some other modern better sort which didn't exist when that Fortran book was written
Keeping everything in order is excellent except when that's the wrong order, hence select_nth_unstable_by takes a function for the ordering we want, which doesn't have to be (indeed usually won't be) the natural ordering of this type. The kernel does have a lot of gnarly intrusive linked lists, and for its purpose this really is sometimes, even often the Right Thing™ because it's doing tricky concurrency acrobatics for example - but since linked lists are also the go-to data structure for C programmers that's not a sure bet as to why there's a linked list, sometimes it just "Not bad enough to pick something else" and a Vec<T> or similar structure natural to a Rust programmer might be better.
Posted Dec 2, 2023 20:21 UTC (Sat)
by Paf (subscriber, #91811)
[Link]
Posted Nov 30, 2023 20:19 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted Dec 1, 2023 0:53 UTC (Fri)
by andresfreund (subscriber, #69562)
[Link] (3 responses)
Could that, at least partially, be addressed by compiling with llvm and enabling LTO? Or is the glue code too much of an optimization barrier?
Posted Dec 1, 2023 1:56 UTC (Fri)
by nksingh (subscriber, #94354)
[Link]
Posted Dec 1, 2023 14:50 UTC (Fri)
by willy (subscriber, #9762)
[Link] (1 responses)
https://lore.kernel.org/linux-fsdevel/20231129152305.GB23...
Posted Dec 2, 2023 13:10 UTC (Sat)
by foom (subscriber, #14868)
[Link]
But, I think what the comment above was suggesting was to compile the C inline functions using Clang into LLVM IR bitcode, and then merge that Clang bitcode and Rust code together when compiling to an object file.
Of course if you build the whole kernel with LTO or ThinLTO this would happen automatically. But you should be able to arrange it to happen on a smaller file-by-file scale only for importing C inline functions, I bet.
Posted Dec 2, 2023 15:34 UTC (Sat)
by bangert (subscriber, #28342)
[Link]
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
binder apparently makes more than the sysv ipc.
i have found only an older link that describes binder: https://elinux.org/Android_Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
I was curious about the same thing, and I found this discussion of how Android uses binder on LKML:
https://lkml.org/lkml/2009/6/25/3
A Rust implementation of Android's Binder
For a rough idea of the scope of the binder's use in Android, here is
a list of the basic system services that are implemented on top of it:
package manager, telephony manager, app widgets, audio services,
search manager, location manager, notification manager, accessibility
manager, connectivity manager, wifi manager, input method manager,
clipboard, status bar, window manager, sensor service, alarm manager,
content service, activity manager, power manager, surface compositor.
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
Wol
A Rust implementation of Android's Binder
In addition, binder originally was intended as an IPC mechanism, with implicit control flow changes (true RPC), which is something every microkernel offers, but linux historically lacked.
I am not sure, if binder still behaves that way, with all shift towards more of its code running in userspace, though.
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
because you will lift all the performance improvement when converting the code.
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
I forgot to put links to the video and slides from the talk — apologies for the omission.
Video and slides
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
A Rust implementation of Android's Binder
https://en.wikipedia.org/w/index.php?title=Google_Fuchsia
which says: As of 2022, most of Fuchsia's code is written in Rust.