The Android Graphics microconference
Sync
Erik Gilling of Google's Android team started things off by covering some background on the Android Sync infrastructure, why it was introduced and why it's important to Android.
Sync is important because it allows for better exploitation of parallelism in graphics and media pipelines. Normally you can think of a pipeline as a collection of different devices that are manipulating buffers in series. Each step would have to completely finish before passing the output buffer to the next step. However, in many cases, there is some overhead work that is required in each step that doesn't strictly need the data that is going to be processed. So you can try to do some of those overhead steps, like setting up the buffer to display, in parallel while the buffer is still being filled. However, there needs to be some interlocking so that one step can signal to the next that it is actually finished. This agreement between steps is the "synchronization contract."
Android through version 3.0 (Honeycomb) didn't use any sort of explicit fencing. Instead, the synchronization contract was implicit and not necessarily well defined. This caused problems as driver writers often misunderstood or mis-implemented the synchronization contract, leading to very difficult-to-debug issues. Additionally, by having the contract be implicit and its implementation spread across a number of drivers, with some being proprietary, it made it very difficult to make changes to that contract in order to improve performance.
To address these issues, Android's explicit synchronization mechanism was implemented. Sync is a synchronization framework that allows SurfaceFlinger (Android's display manager) to establish a timeline and set sync points on that timeline. Other threads and drivers can then block on a sync point and will wait until the timeline counter has crossed that point. There can be multiple timelines, managed by various drivers; the Sync interface allows for merging sync points from different timelines. This is how the SufaceFlinger and BufferQueue processes manage the synchronization contract across a number of drivers and processes.
In describing the various sync timelines and the ability to merge sync points on different timelines, Maarten Lankhorst, author of dmabuf fences and wait/wound mutexes, got into a discussion about whether circular deadlocks were possible with Android's sync framework. Erik believed they were not, and made a convincing case, but he admitted that he's not had to use any systems with video RAM (which has complicated locking requirements that led to the need for wait/wound mutexes), so the result of the discussion wasn't exactly clear.
Tom Cooksey from ARM mentioned that, in graphics development, trying to debug issues related to why things didn't happen in the order they were expected is really hard, and that the Android Sync debugging infrastructure makes this much easier. Maarten noted that, for dma-buf fences, the in-kernel lockdep infrastructure can also be used to prove locking correctness. But, it was pointed out, that only works because fences are not held across system calls.
There was also some question of how to handle the unwinding of hardware fences and other error conditions, which Erik thought should be resolved by resetting the GPU. Rob Clark thought that wasn't a very good solution; he worries that resetting the GPU can take a long time and might interfere with concurrent users of the GPU.
In trying to figure what the next steps would be, Rob said he didn't have any objection to adding sync point arguments to the various functions, as long as they were optional. He thought that the explicit sync points could either be built on top of dma-buf fences, or otherwise fall back to dma-buf fences. Erik mentioned that while the Android sync points aren't tied to anything specific in the kernel, they are really only used for graphics buffers, so he thought tying sync points to dma-bufs might be doable. There didn't seem to be any objections to this approach, but it also wasn't clear that all sides were in agreement, so folks promised to continue the discussion on unifying the lower-level primitives on the mailing list.
The atomic display framework
Greg Hackmann from the Android team then discussed the atomic display framework (ADF) which was developed while he was trying to develop a version of HWComposer based on the kernel mode setting (KMS) interface. During that development, Greg ran into a number of limitations and issues with KMS, so he developed ADF as a simple display framework built on top of dma-buf and Android Sync. Thus ADF represents somewhat of an ideal interface for Android, and Greg wanted to see whether folks thought the KMS interface could be extended to provide the same functionality.
One of the limitations discussed was the absence of atomic screen updates. There is the out-of-tree atomic mode setting and nuclear pageflip patch set [PDF], but in that implementation updates to the screen are done by deltas, updating just a portion of the screen. Android, instead, prefers to update the entire screen to reduce the amount of state that needs to be kept as well as to avoid problems with some system-on-chip (SoC) hardware.
There is also the problem of KMS not handling ganged CRTCs (CRT controllers that generate output streams to displays), split-out planes, or custom pixel formats well, and that the modeling primitives KMS uses to manage the hardware don't map very well to some of the new SoCs. Further, there wasn't a good way to allow KMS to exchange sync-point data.
In addition, Greg sees the KMS interface as being fairly complex, requiring drivers to implement quite a bit of boilerplate code and to handle many cases that aren't very common. The concern is that if the API is large, SoC driver writers will only write the parts they immediately need and will likely make mistakes on the edge cases. There was some discussion that maybe KMS could use some helper functions, like the fbdev (Linux framebuffer device) helpers in the DRM layer which automatically provide an fbdev interface for DRM drivers.
As a result, ADF's design is a bit simplified, representing displays as a collection of overlay engines and interfaces which can be interconnected in any configuration the hardware supports. ADF uses a structure that wraps dma-buf handles with Sync data and formatting metadata. ADF then does sanity checks on buffer lengths and pixel formatting, deferring to driver-specific validation if custom formats are in use. ADF also manages any waiting that is required on the sync handle before flipping the page, and provides driver hooks for mode-setting and events like DPMS changes and vsync signals.
Rob noted that issues like split-out planes or custom pixel formats are solvable in the KMS API, and in many cases he has specific plans to do so. For others, like ganged CRTCs he's hesitant and wants to get more info on the hardware before deciding how it would be best to add the requisite functionality.
There was some minor debate about how ADF tends to allow blobs of data to be passed through it from user space to drivers, requiring hardware-specific user-space code. This functionality makes it harder to support other display managers — Wayland, for example — that depend on a hardware-independent interface. Rob noted that, for the most part, Wayland is very similar to SurfaceFlinger, but maybe just a few years behind when it comes to things like ganged CRTCs, and that improvements are needed. But he was also concerned with the desire for KMS to be generic and to have hardware-independent Weston user space, so maybe there need to be some cases where there are hardware-specific plugins, but it will need to fall back to a slower generic implementation.
Folks from the Android team pointed out that it really is hard to anticipate all the constraints and how weird the hardware ends up being. So the issue of where to draw the line on generic interfaces vs hardware-specific seemed unresolved. However, ADF does allow for a simple non-accelerated recovery console, which would be generic.
There was also further discussion on how the atomic mode setting does partial updates or deltas while ADF focuses on full updates. With Wayland being very similar to SurfaceFlinger, the partial updates are really not as useful there, and it's mostly for X that partial updates are useful. There was some question of whether X should really be targeted for atomic mode setting, but Rob said that, while for some things like overlays X isn't a target, X likely will use atomic mode setting. There was also some question as to what a "full frame update" entails, and whether it means updating things like gamma tables as well, as that can be slow on some hardware.
Other KMS extensions
Laurent Pinchart walked through a number of other proposed extensions to KMS. The first was non-memory-backed pipeline sources. Here the issue is that there can be complicated video pipelines where a capture device can write to both memory and directly to the display at the same time. KMS doesn't really model this well, and Laurent would like to see some sort of API to handle this case. There was some back and forth with Rob as to if the DRM framebuffer objects would mostly suffice.
The next issue was memory writeback, where the composited frames are written to memory instead of being sent to a display, and what the right API is for this. On one hand this looks very similar to video capture, so the proposal was to use Video4Linux (V4L) device nodes. Mirroring earlier issues raised, Greg noted that in many situations it's just a lot simpler to write a custom driver that does the bare minimum of what is needed than to try to implement the entire V4L interface. Driver writers are often under pressure, so they're unlikely to do the right thing if it requires writing lots of extra code. Hans Verkuil, the maintainer of V4L, expressed his exasperation with people who go off and do their own thing, needlessly reinventing the wheel, and that he is very open to addressing problems and improving things. Rob again suggested that V4L may need some internal refactoring and helper functions to try to make it easier to write a driver.
There were also discussions on chained compositors, non-linear pipelines and root planes that don't span the whole display, but it didn't seem like there was much resolution to the issues discussed. Hans finally asked that folks mail the linux-media mailing list, as the V4L developers would be interested in working and collaborating to resolve some of these issues.
Deprecating fbdev
The next short topic was deprecating the fbdev interface and discussions to see if this would impact Android as it's a common user of fbdev. For Android, fbdev is particularly useful for very early hardware bringup and recovery consoles. Greg pointed out that Google was able to bring up a Nexus 10 without fbdev using ADF, so this wouldn't be a problem for them, assuming the issues in KMS that ADF worked around were resolved.
ION
The discussion then turned to the ION memory allocator. With much of the background for the topic being covered on LWN, I summarized some of the recent responses from the community and the current thinking from the Android developers and asked what would be reasonable next steps to upstreaming ION. The upstream developers were suggesting that the dma-buf delayed allocation method be used, where user space would attach the dma-buf to all the various devices and allow the allocation to happen at map time.
One problem with this approach that the Android developers saw was that it can have permissions issues. It would require one process that has permissions to all the various devices to do the attach; the Android developers would prefer to avoid having one process with permissions to everything, and instead minimize the permissions needed. The way ION currently works is by having gralloc just know the memory constraints for all the devices, but it doesn't need permissions to all of those devices to allocate buffers. Those buffers can then be passed between various processes that have only the limited device permissions for their specific needs.
With respect to the approach of trying to do the constraints solving in kernel space, Colin Cross, an Android developer, brought up that the SoC constraints are insane, and the Android team sees dynamic constraint solving as an impossible problem. One just cannot generally enumerate the various constraints in a sane way, and trying to then map to specific allocators which satisfy those constraints will always be messy. Additionally, since the desktop use case has a more dynamic environment, there may not exist a solution to the given constraints, and in that case one needs to fall back to a slow path where buffers are copied between device-compatible memory regions. The point was made that for Android, slow paths are not an option, and because of that they expect a level of control made possible by customizing the entire stack to each device.
The solution Android took with ION was to provide known allocators (heaps) and then create a device-specific, custom mapping from device to allocator in user space via the custom gralloc implementations. Colin admitted there would always be some messiness and that the Android developers prefer that messiness to exist in user space.
As far as next steps, it was proposed that, instead of trying to enumerate generic constraints in the dma-buf parameters structure, we could instead have a table of allocators and have the devices link to compatible allocators for the device at attach time. That way we could use the same heap allocators that ION uses and just have two different ways of triggering that allocation. This would allow some shared infrastructure if there couldn't be an agreed-upon top-level interface for the allocations. Rob seemed to be agreeable, but Colin brought up the downside that, by enumerating allocators per device, when new heaps are added, we would have to go through all the drivers and update their heap compatibility. This is a fair point, but it is the cost of doing constraint solving in the kernel, rather than via custom user-space code, and as long as there was still the direct ION-like interface, it wouldn't have any negative consequences for Android.
Another interesting result was that Colin has been busy addressing some of the other issues with ION that were brought up in the LWN summary and in other discussions. It seems likely that ION will be submitted to staging so that the transition to using shared heaps can be more centrally coordinated.
I'd like to thank all of the discussion participants, along with Laurent
Pinchart and Jesse Barker for co-organizing the micro-conference, and Zach
Pfeffer for his diligent note taking.
Index entries for this article | |
---|---|
Kernel | Android |
Kernel | Device drivers/Graphics |
GuestArticles | Stultz, John |
Conference | Linux Plumbers Conference/2013 |
Posted Oct 20, 2013 20:28 UTC (Sun)
by pixelpapst (guest, #55301)
[Link]
The ADF layer would ask each allocator in turn if it could satisfy the superset of constraints for the current request. If no allocator can, fail verbosely. Then allow driver writers to implement a special allocator as a separate kernel object, which can satisfy a specific combination of constraints that appears on a particular platform (specific SoC with specific user-space). This should avoid growing all this craziness into user-space, while keeping the kernel interfaces as generic as possible under these circumstances.
Take all this with a grain of salt, though. I'm not yet as fluent in the android graphics stack as I'd like to be.
The Android Graphics microconference