Restricted DMA

By Jonathan Corbet
January 7, 2021

A key component of system hardening is restricting access to memory; this extends to preventing the kernel itself from accessing or modifying much of the memory in the system most of the time. Memory that cannot be accessed cannot be read or changed by an attacker. On many systems, though, these restrictions do not apply to peripheral devices, which can happily use direct memory access (DMA) on most or all of the available memory. The recently posted restricted DMA patch set aims to reduce exposure to buggy or malicious device activity by tightening up control over the memory that DMA operations are allowed to access.

DMA allows devices to directly read from or write to memory in the system; it is needed to get reasonable I/O performance from anything but the slowest devices. Normally, the kernel is in charge of DMA operations; device drivers allocate buffers and instruct devices to perform I/O on those buffers, and everything works as expected. If the driver or the hardware contains bugs, though, the potential exists for DMA transfers to overwrite unrelated memory, leading to corrupted systems and unhappy users. Malicious (or compromised) hardware can use DMA to compromise the system the hardware is attached to, making users unhappier still; examples of this type of attack have been posted over the years.

One way to address this problem is to place an I/O memory-management unit (IOMMU) between devices and memory. The kernel programs the IOMMU to allow access to a specific region of memory; the IOMMU then keeps devices from straying outside of that region. Not all systems are equipped with an IOMMU, though; they are mostly limited to the larger processors found in desktop machines, data centers, and the like. Mobile systems usually lack an IOMMU.

The restricted DMA patch set, posted by Claire Chang, is an attempt to apply some control to DMA operations on systems without an IOMMU. To do so, it builds on an old, relatively obscure kernel mechanism called the "swiotlb", which stands for "software I/O translation lookaside buffer". The swiotlb was originally created to facilitate operations with devices that have annoying DMA limitations, such as the inability to address all of the memory in the system. The core mechanism used within the swiotlb is bounce buffering: allocating a buffer in a region that the device in question is able to access, then copying data between I/O buffers and this bounce buffer as needed. Copying the data clearly slows I/O operations, but it is far better than not using DMA at all.

Chang's patch set enhances the swiotlb by allowing it to allocate a specific range of physical memory and associate it with a given device; this range can be specified in a devicetree using the new restricted-dma-pool "compatible" property. All DMA operations involving that device will be bounced through that range of memory, effectively isolating devices from the actual I/O buffers seen by the rest of the system.

Using this kind of bounce-buffering offers some benefit on its own. Your editor, who has written device drivers in the past, would never have committed such an error, but it is not unheard of for driver bugs to result in a device performing DMA when the rest of the system thinks it should be idle. Having memory buffers seemingly randomly overwritten in unreproducible ways can (again, your editor relies on the word of others for this) result in corrupt data, painful debugging sessions, and excessive alcohol use. By separating the buffer used by the device from the buffer used by the kernel, restricted DMA can mitigate many of the more unpleasant effects of this sort of bug.

Readers may be wondering, though, how the use of the swiotlb will protect the system against a malicious or compromised device; such devices may well ignore polite requests to restrict their DMA activities to the designated area, after all. The answer is that it will not protect systems from this type of attack — at least, not on its own. The evident intent, though, is to pair restricted DMA with trusted firmware implementations that are able to restrict DMA operations to specific ranges of memory; these restrictions are set up at (or before) boot time and cannot be changed by the kernel. So the trusted firmware can constrain a device's access to the designated region, while the restricted DMA mechanism causes all DMA operations to go through that region. Together, these mechanisms provide a way to enable DMA without allowing a device to access arbitrary memory, all without an IOMMU in the system.

The amount of setup work required suggests that this capability will not be present on most general-purpose systems anytime soon. But on tightly controlled systems — mobile devices, for example — there is clear value in making the additional effort to prevent compromise via a hostile device. It's not clear whether the restricted DMA patches will make it into the mainline in their current form, but chances are that this kind of mechanism will be merged sooner or later.

Index entries for this article
Kernel	Direct memory access

Restricted DMA

Posted Jan 7, 2021 21:50 UTC (Thu) by ttuttle (subscriber, #51118) [Link] (2 responses)

But how does the *firmware* restrict a device's DMA access without an IOMMU?

Restricted DMA

Posted Jan 8, 2021 13:52 UTC (Fri) by danielthompson (subscriber, #97243) [Link] (1 responses)

Perhaps better to think of trusted firmware as "a firmware" rather then "the firmware"! In this case the trusted firmware is the component that manages switching in and out of trustzone on arm64 systems and, additionally, it provides reference bootloaders to get the trusted and normal worlds running.

If you have DMA peripheral that can restrict the set of address it will use *and* a SoC that can block further changes or make changing them a privileged operation (e.g. can only be done from trusted world) then the bootloader parts of the trusted firmware can be modified to configure the DMA windows for the hardware and then seal them off before Linux starts to run.

Restricted DMA

Posted Jan 11, 2021 23:48 UTC (Mon) by florianfainelli (subscriber, #61952) [Link]

The key for this scheme to work is that you need some sort of protection mechanism whereby the PCIe host bridge is allowed/denied access to specific regions of memory. The use of an ARM Trusted Firmware is probably two fold in that it is part of the chain of trust for said platform, and given there are at least 2 different SoC vendors to be supported, then the firmware provides some nice abstraction on how to configure this region to be restricted.

Restricted DMA

Posted Jan 7, 2021 22:04 UTC (Thu) by iustin (subscriber, #102433) [Link] (3 responses)

This is interesting but I don't get one thing. The article says that this is mostly useful for systems that don't have IOMMU, but it's even less likely they have custom firmware then? Or is the restriction done by the firmare cheaper to implement than a full IOMMU controller?

Restricted DMA

Posted Jan 7, 2021 23:00 UTC (Thu) by johntb86 (subscriber, #53897) [Link] (2 responses)

Yeah, it could be a lot cheaper since the hardware only has to compare addresses with one range (perhaps in "min" and "max" registers) before allowing an access, rather than walking page tables, caching lookups in a TLB, etc. There's definitely hardware out there that doesn't have IOMMUs but where the firmware can set up those types of range-based restrictions.

Restricted DMA

Posted Jan 7, 2021 23:25 UTC (Thu) by pm215 (subscriber, #98099) [Link] (1 responses)

Mmm; in a reply to the patchset (https://lwn.net/ml/linux-kernel/d7043239-12cf-3636-4726-2...) Florian Fainelli describes a mechanism like that in some Broadcom SoCs where the firmware can define a specific window in DRAM that the PCIe bridge is allowed to DMA to, for instance.

Restricted DMA

Posted Jan 11, 2021 23:51 UTC (Mon) by florianfainelli (subscriber, #61952) [Link]

Right, it is common for those SoCs not to have a full blown IOMMU that supports both the translation and protection parts, but they have a memory protection unit. Given the memory controller architecture, each DMA capable peripheral is given an unique identifier and the memory controller arbitrates all accesses to DRAM. It becomes simple to have an on-chip memory that contains the protection (which can be protected itself with an additional layer of protection against the register space itself) to enforce, and an tuple consisting of {protection bits, client ID, range}.

Restricted DMA

Posted Jan 8, 2021 6:21 UTC (Fri) by marcH (subscriber, #57642) [Link] (2 responses)

> One way to address this problem is to place an I/O memory-management unit (IOMMU) between devices and memory. The kernel programs the IOMMU to allow access to a specific region of memory; the IOMMU then keeps devices from straying outside of that region. Not all systems are equipped with an IOMMU, though; they are mostly limited to the larger processors found in desktop machines, data centers, and the like.

Does Linux always use the IOMMU to restrict access on these larger processors?

Restricted DMA

Posted Jan 8, 2021 14:28 UTC (Fri) by abatters (✭ supporter ✭, #6932) [Link]

It depends on the kernel config (e.g. CONFIG_INTEL_IOMMU) and kernel commandline (e.g. intel_iommu=).

Restricted DMA

Posted Jan 11, 2021 23:54 UTC (Mon) by florianfainelli (subscriber, #61952) [Link]

For these types of platforms the ARM System IOMMU may, or may not be present, it depends on whether the SoC integrator decided to put on. It has a cost in terms of silicon space (additional fast memories to hold TLB entries, etc.) and a cost in terms of performance too since each memory access needs to be translated.

Historically if you could allocate large enough contiguous buffers for your video encoder/decoder, audio DSPs, etc. and you had a protection unit, you could be done with that solution. Most other peripherals like SPI, I2C, NAND, Ethernet, PCIe, whatever have to support scatter gather by default, or they do PIO if they don't need to be fast.

Restricted DMA

Posted Jan 8, 2021 9:15 UTC (Fri) by flussence (guest, #85566) [Link] (5 responses)

At last! It's safe to use USB3 on the desktop! (Yeah, I know about sysfs device authorization… but who bothers to set all that up correctly?)

Restricted DMA

Posted Jan 8, 2021 9:35 UTC (Fri) by randomguy3 (subscriber, #71063) [Link]

Although the article suggests that desktop systems would normally have an iommu, and so already have better protections than this new system can provide.

Restricted DMA

Posted Jan 8, 2021 11:42 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (3 responses)

USB3 doesn't support device-initiated DMA to arbitrary addresses. Are you thinking of Thunderbolt? If so, Gnome and KDE both handle device authorisation out of the box afaik.

Restricted DMA

Posted Jan 8, 2021 13:34 UTC (Fri) by foom (subscriber, #14868) [Link] (1 responses)

Now also known as "USB4".

Restricted DMA

Posted Jan 14, 2021 5:20 UTC (Thu) by marcH (subscriber, #57642) [Link]

> [Thunderbolt] Now also known as "USB4".

No, there is absolutely _nothing_ simple in that area.

USB4 re-uses the low layers of Thunderbolt but the Thunderbolt protocol is optional in USB4.

Meaningless "USB-C" and tunneling were not confusing enough, so they decided to double down.

The only way to make this clearer would have been to name the different layers independently as in networking (no one confuses HTTP with Ethernet) because they _are_ independent now. Too late.

Restricted DMA

Posted Jan 9, 2021 19:18 UTC (Sat) by flussence (guest, #85566) [Link]

>Are you thinking of Thunderbolt?

Probably, yes. I haven't been able to keep up with The One True Universal Connectivity Standard since they added half a dozen new ones to the same port.

Though this sounds like it'd be a good thing for Firewire too, if anyone still cared about it? I seem to remember plug-and-play DMA access being a “feature” back in the day…

Restricted DMA

Posted Jan 8, 2021 13:36 UTC (Fri) by pabs (subscriber, #43278) [Link] (1 responses)

Instead of copying buffers around, could multiple buffers be allocated, with writes going to one of them at a time? Or is it not possible to change the write location at runtime? or is that slow?

Restricted DMA

Posted Jan 11, 2021 23:58 UTC (Mon) by florianfainelli (subscriber, #61952) [Link]

It is really a matter of how far you want to go on the other side. The nice solution with what is being proposed here is that this is the bottom layer that is responsible for controlling the DMA addresses handed to an arbitrary driver, and this is located right above the driver itself. This works with a WLAN driver, but also with NVMe, Ethernet, FPGA, literally anything that uses the DMA-API if you wanted.

In the case of networking/WLAN you have socket buffers you want to transmit that are coming from user-space that are scattered in virtual and physical address space and you need to shove them through a restricted region of DRAM from which the PCIe bridge is allowed to read and write from. Bounce buffering is pretty much the only way for that direction of transfers. For receiving buffers the OS needs to allocate data buffers for the WLAN chips to put data into, so you can allocate from the restricted DMA region already and avoid the bounce buffering in that case.