Device tree troubles

By Jonathan Corbet
July 24, 2013

Kernel developers working on the x86 architecture are spoiled; they develop for hardware that, for the most part, identifies itself when asked, with the result that it is usually easy to figure out how a specific machine is put together. Other architectures — most notably ARM — are rather messier in this regard, requiring the kernel to learn about the configuration of the hardware from somewhere other than the hardware itself. Once upon a time, hard-coded "board files" were used to build ARM-system-specific kernels; more recently, the device tree mechanism has emerged as the preferred way to describe a system to the kernel. A device tree file provides the enumeration information that the hardware itself does not, allowing the kernel to understand the configuration of the system it is running on. The device tree story is one of success, but, like many such stories, success is bringing on some growing pains.

A device tree "binding" is the specification of how a specific piece of hardware can be described in the device tree data structure. Most drivers meant to run on platforms where device trees are used include a documentation file describing that driver's bindings; see Documentation/devicetree/bindings/net/can/cc770.txt as a randomly chosen example. The kernel contains nearly 800 such files, plus a hundreds more ".dts" files describing complete system-on-chips and boards, and the number is growing rapidly.

Maintenance of those files is proving to be difficult for a number of reasons, but the core of the problem can be understood by realizing that a device tree binding is a sort of API that has been exposed by the kernel to the world. If a driver's bindings change in an incompatible way, newer kernels may fail to boot on systems with older device trees. Since the device tree is often buried in the system's firmware somewhere, this kind of problem can be hard to fix. But, even when the fix is easy, the kernel's normal API rules should apply; newer kernels should not break on systems where older kernels work.

The clear implication is that new device tree bindings need to be reviewed with care. Any new bindings should adhere to existing conventions, they should describe the hardware completely, and they should be supportable into the future. And this is where the difficulties show up, in a couple of different forms: (1) most subsystem maintainers are not device tree experts, and thus are not well equipped to review new bindings, and (2) the maintainers who are experts in this area are overworked and having a hard time keeping up.

The first problem was the subject of a request for a Kernel Summit discussion with the goal of educating subsystem maintainers on the best practices for device tree bindings. One might think that a well-written document would suffice for this purpose, but, unfortunately, these best practices still seem to be in the "I know it when I see it" phase of codification; as Mark Brown put it:

At the minute it's about at the level of saying that if you're not sure or don't know you should get the devicetree-discuss mailing list to review it. Ideally someone would write that document, though I wouldn't hold my breath and there is a bunch of convention involved.

Said mailing list tends to be overflowing with driver postings, though, making it less useful than one might like. Meanwhile, the best guidance, perhaps, came from David Woodhouse:

The biggest thing is that it should describe the *hardware*, in a fashion which is completely OS-agnostic. The same device-tree binding should work for Solaris, *BSD, Windows, eCos, and everything else.

That is, evidently, not always the case, currently; some device tree bindings can be strongly tied to specific kernel versions. Such bindings will be a maintenance problem in the long term.

Keeping poorly-designed bindings out of the mainline is the responsibility of the device tree maintainers, but, as Grant Likely (formerly one of those maintainers) put it, this maintainership "simply isn't working right now." Grant, along with Rob Herring, is unable to keep up with the stream of new bindings (over 100 of which appeared in 3.11), so a lot of substandard bindings are finding their way in. To address this problem, Grant has announced a "refactoring" of how device tree maintainership works.

The first part of that refactoring is Grant's own resignation, with lack of time given as the reason. In his place, four new maintainers (Pawel Moll, Mark Rutland, Stephen Warren and Ian Campbell) have been named as being willing to join Rob and take responsibility for device tree bindings; others with an interest in this area are encouraged to join this group.

The next step will be for this group to figure out how device tree maintenance will actually work; as Grant noted, "There is not yet any process for binding maintainership." For example, should there be a separate repository for device tree bindings (which would make review easier), or should they continue to be merged through the relevant subsystem trees (keeping the code and the bindings together)? It will take some time, and possibly a Kernel Summit discussion, to figure out a proper mechanism for the sustainable maintenance of device tree bindings.

Some other changes are in the works. The kernel currently contains hundreds of .dts files providing complete device trees for specific systems; there are also many .dtsi files describing subsystems that can be included into a complete device tree. In the short term, there are plans to design a schema that can be used to formally describe device tree bindings; the device tree compiler utility (dtc) will then be able to verify that a given device tree file adheres to the schema. In the longer term, those device tree files are likely to move out of the kernel entirely (though the binding documentation for specific devices will almost certainly remain).

All told, the difficulties with device trees do not appear to be anything other than normal growing pains. A facility that was once only used for a handful of PowerPC machines (in the Linux context, anyway) is rapidly expanding to cover a sprawling architecture that is in wide use. Some challenges are to be expected in a situation like that. With luck and a fair amount of work, a better set of processes and guidelines for device tree bindings will result from the discussion — eventually.

Index entries for this article
Kernel	Device tree

Device tree troubles

Posted Jul 25, 2013 11:59 UTC (Thu) by linusw (subscriber, #40300) [Link]

Thanks for doing this write-up Jon. As noted in another thread in the Ksummit mailing list, ACPI (the other hardware description tree) has its own set of similar troubles, just not yet as big, with x86 drivers going back to pre-PCI ISA-style ioport probing for devices. I would say it is something of an embedded or shorter-development cycle problem. (I characterized it as vendors constantly keeping on finger on the fast-forward button.)

Device tree troubles

Posted Jul 25, 2013 14:30 UTC (Thu) by etienne (guest, #25256) [Link] (18 responses)

It seems that device trees are also used to describe ARM "system on chip" even when the macro-blocks can be probed by software, having unique signature and even version information available at predefined addresses.
Just nobody did take the time to write that auto-detection, or had all the NDA necessary.
Shall the device tree describe the macro-compoments or only the different chips on the board (like how much DDR is on the card, at which I2C address is that subsystem)?
Is the device tree a way to replace Linux command line?
When the device tree is "glued" to the kernel to ease the boot process (instead of fixed ROM like interface), would it not be better to use linker command file to define all those constants?

Device tree troubles

Posted Jul 25, 2013 15:27 UTC (Thu) by dougg (guest, #1894) [Link] (8 responses)

I get to have a down under (looking up) view of DT for various Atmel SoCs: at91sam9g20, g25 and the sama5d3. Forget auto-detect of macrocells: there are so many (I just counted 38 on the sama5d3 and that is without instances, so I counted 1 for the PIO but there are 5 instances), that they can't all be configured. At least two reasons: the 160 gpio pins would be insufficient to route them to the outside of the chip, and the power budget for the chip would probably get blown (literally). The main way that SoCs save power is by _not_ configuring macrocells (or configure them at lower clock frequencies). So again auto-detect is just wrong for SoCs.
Well yes, DT is a replacement for the kernel command line which obviously can't handle the amount of data we are talking about here. And having a specific DT blob boot the same SoC on another OS is a nice idea, I'd put that in the same category as nuclear fusion for large scale electricity generation :-)

Device tree troubles

Posted Jul 26, 2013 9:45 UTC (Fri) by etienne (guest, #25256) [Link] (7 responses)

I also deal with a complex Sitara ARM Cortex-A8 system on chip, where each subsystem can be independently switched ON or OFF, can be independently clocked or un-clocked, and can be active or sleeping.
I do think that every subsystem shall be known by the kernel, so that at boot the kernel turns OFF everything he does not need.
Else you get in a situation where software version N+1 manages a new subsystem, and if you reboot the card (hot boot) into version N the "new subsystem" is still powered ON and clocked - unlike cold boot into version N.

What I really want is an ELF boot-loader (which can load multiple sections at different addresses) and a C description of all these subsystems, so that the memory mapped address range are initialised in this order:
- value for all I/O ports (in C, const port42_t port42 = (port42_t) {1, 0}; - not 10000's of #define)
- direction for all I/O port
- functions of each package pins (which is currently managed by another set of software, not device tree)
At that point of loading the Linux kernel I know that any previous crash which may have left I/O port short circuit is now fixed.
Then I want the ELF file to contain pre-linked code&data into the fast internal memory, so that is the next section of the ELF file. With device tree, one can only describe the internal memory, not its content, so all the loading/linking has to be done manually.
Then the ELF boot-loader can load the kernel code & data, and the initial ram disk if it is not directly mappable from FLASH.

Tl,dr: use fewer tools (GCC and LD) to do more things, and do checks at link time, not execution time like device tree.

Device tree troubles

Posted Jul 27, 2013 18:28 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (6 responses)

> Tl,dr: use fewer tools (GCC and LD) to do more things, and do checks at link time, not execution time like device tree.

That's not very UNIX-like ;) . Personally, I think that if device trees could be used as a base for other kernels, we'll be better off[1] and making the format either code or an ELF binary doesn't help that.

[1]For the BSDs, if/when a replacement for Linux comes along written in INTERCAL, etc.

Device tree troubles

Posted Jul 29, 2013 10:17 UTC (Mon) by etienne (guest, #25256) [Link] (4 responses)

> That's not very UNIX-like ;)

LD is very Unix like, it does one thing (turning symbols into real addresses) and does it right.
You are saying that some symbols are different and shall be described by a complex device tree system?
The DT is so complex even the boot-loader (U-boot) is not using it (because it has to be small to fit in the (256 Kb or less) internal memory), to find/initialise the serial port and have a command line interface...
At some point, all the hardware addresses and I/O configuration have to be defined for the system to work, I am saying that there is no advantage to initialise those at boot (where no debugging is possible) compared to initialise them in the build Makefile, or at installation time by a final linker command (of a partially linked ELF file).

Device tree troubles

Posted Jul 29, 2013 10:45 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (3 responses)

> I am saying that there is no advantage to initialise those at boot (where no debugging is possible) compared to initialise them in the build Makefile, or at installation time by a final linker command (of a partially linked ELF file).

Except that the latter makes it pretty much fundamentally impossible to get to the point where you can boot one media on differing devices.

Device tree troubles

Posted Jul 29, 2013 11:12 UTC (Mon) by etienne (guest, #25256) [Link] (2 responses)

> boot one media on differing devices

Assuming you are talking of a SD-card (no CDROM/DVD), you probably already have an installation script to either select which DT to use (if the DT is on the SD-card) or how to access it (if the DT is on some FLASH on the board, at which address it is and how is the FLASH configured).
Instead of having tools to upgrade/downgrade/display/edit the DT, is would be simpler to just do a final link in the install script.

Device tree troubles

Posted Jul 29, 2013 11:58 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (1 responses)

> Assuming you are talking of a SD-card (no CDROM/DVD)
SD-card, internal storage, USB, whatever. I am pretty happy that CDs are dying a rather rapid death these days.

> Instead of having tools to upgrade/downgrade/display/edit the DT, is would be simpler to just do a final link in the install script.

I don't think that's really realistic. You'd need to get to the point of running a pretty much full blown OS to actually run a full elf linker. Those aren't simple beasts anymore. Also, linking a kernel image, even one trimmed for an embedded architecture, actually takes a good amount of resources.

Device tree troubles

Posted Jul 29, 2013 13:31 UTC (Mon) by etienne (guest, #25256) [Link]

> I don't think that's really realistic. You'd need to get to the point of running a pretty much full blown OS to actually run a full elf linker.

Maybe I am underestimating the difference in between linking and relocating.
We have already bootloaders which are able to relocate few sections at run-time on ia32, i.e. running "ld --emit-relocs" and moving those sections where appropriate.
Even on Windows you could have a installation software which relocate few hundreds of sections to configure a SD-card (including it's own U-boot) to the right value for the targeted ARM hardware.

Device tree troubles

Posted Jul 31, 2013 6:52 UTC (Wed) by alison (subscriber, #63752) [Link]

> if device trees could be used as a base for other kernels, we'll be better > off[1] and making the format either code or an ELF binary doesn't help that.

I'm reminded of (IIRC) the answer Rex Dieter gave when asked why KDE bothered to have a Windows version. (paraphrasing) "Because requiring developers to create code that will run on Windows is the best guarantee of good level separation and cleaning coding practices."

Just so with device-tree: if the bindings have names specific to Linux, then there will be a temptation to take the extra step of choosing bindings that are specific to particular kernel release, and then we'll never have the stable ABI. While I argued with the stable ABI idea in the cited discussion thread, the discussion is more about how to practically achieve the goals and when, not what they are. Forcing an OS-agnostic interface on the device-tree will force better coding practices, as Dieter expressed.

Device tree troubles

Posted Jul 25, 2013 22:06 UTC (Thu) by dlang (guest, #313) [Link] (8 responses)

This is stuff that you can't autodetect.

The particular I/O pins can be used for many things, and if you send the wrong signals to the wrong pins you can seriously corrupt what's connected there (or in some cases, you can cause actual physical damage, since some of those pins may be connected to devices that affect the real world)

That's why the kernel needs to get a description of the system from something other than the hardware itself.

Device tree troubles

Posted Jul 26, 2013 9:00 UTC (Fri) by etienne (guest, #25256) [Link] (7 responses)

> That's why the kernel needs to get a description of the system from something other than the hardware itself.

Obviously the kernel has to know the hardware, but the question is:
Shall this description be compiled-in (#ifdef's), linked-in (an extra file on "ld" command line with lines like "uart1_base = 0x45000000;\n"), or glued-in in a device tree file which - if describing properly a system on chip - would be easily more than few thousand lines.
Shall we pre-process by CPP the device tree description to add "#include", to protect against multiple inclusion, and have "#ifdef" everywhere?
And all those functions to get a device tree value in the kernel, where you know that the address of that subsystem (for instance UART) is obviously constant, they do not really simplify the driver source - they have to handle errors like "device tree not loaded", "element not found" at run time (not at compile time or link time).

Device tree troubles

Posted Jul 26, 2013 13:33 UTC (Fri) by dlang (guest, #313) [Link] (6 responses)

> ...the address of that subsystem (for instance UART) is obviously constant...

and this is the root of the misunderstanding.

On ARM devices, the address of the UART is NOT consistent, there are many possible places for it to be, and the same physical pins that are used for one UART may be used for something else instead.

So the kenrel needs to be told what device is hooked up to those pins.

this isn't a matter of it being something that's defined for a particular SoC release, it can depend on how the board that it's plugged into is wired.

And in addition to this, you can have another SoC model that's got most of the same features as the one I talked about above, but which has the different components hooked up on the chip differently, so the functions that share particular pins are going to be different.

The alternative to DT isn't that the kernel will know what's there, or that the kernel can probe and figure out what's there. the alternative is that you will have to have a file included into the kernel build that tells the kernel about the system. At which point that kernel build cannot be used for any other system.

Device tree troubles

Posted Jul 26, 2013 14:31 UTC (Fri) by etienne (guest, #25256) [Link] (4 responses)

> > ...the address of that subsystem (for instance UART) is obviously constant...
> and this is the root of the misunderstanding.

I did not express myself precisely. It is constant at run time (C const), and will not change in between power-cycles.
That is why it can be defined by the linker.
If you want a common file in between few boards, that common file shall be a partial link, where few symbols are still not defined - the final link specialise that Linux kernel for that system on chip on that board.

Trying to define pin functions in the pre-boot environment calls for problems, because for instance you still did not define what is the UART pins for the console - and cannot check anything beforehand - (no error log can be printed).
At least at link time, errors can be produced if you define the console to an UART which does not have its pin configured for UART but for I/O port.

For what I can see, most people attach the DT to the kernel one way or another (i.e. DT not constant for a board), because each new kernel version define more fields of DT or fixes DT bugs (DT would be too complex if it were complete, just define what you use) - and so the DT complexify the system (and slow it down) instead of simplifying it.
Ever had a Device Tree: incomplete, wrong version, not at the right place in the FLASH, not present because someone erased that FLASH partition, truncated because the partition/sector_size is too small, corrupted, or the wrong DT file has been copied to FLASH?
How do you recover that board, DHCP/BOOTP (present inside system-on-chip internal ROM) only transfer a single file, the Linux kernel - no DT!

Device tree troubles

Posted Jul 26, 2013 20:16 UTC (Fri) by dlang (guest, #313) [Link] (3 responses)

> just define what you use

This is wrong, and this point is being made on the kernel list

DT is supposed to describe the hardware period. It's supposed to do so not just for Linux, but for other OSs as well.

If the hardware doesn't change, then the DT should not change. If it does, it should be a bug in the DT being fixed, or it's an API change (which should not be happening)

by the way, where do you expect the data to come from to get linked into your binary?

Device tree troubles

Posted Jul 29, 2013 10:55 UTC (Mon) by etienne (guest, #25256) [Link] (2 responses)

> > just define what you use
>
> This is wrong, and this point is being made on the kernel list

Sorry no more reading LKML, not enough time.
So you ask me to write those 10000 or more lines to describe that system on chip, with no real way to test?
Did you see the complexity of those systems on chip lately, they have a defined area of silicon and a package pin count; they fit everything they can find.
They do not even describe completely everything in their voluminous docs, they add reference to sub-parts: lately I wanted to check if there is a DMA in virtual address provided by the ARM subsystem (there are other DMAs in physical addresses in the system) - the only way to know is to ask the processor the content of a register (the docs only say where to read...).

When that is done, I also have to define the hardware PCB with a few thousand lines of DT describing how the system could be used and then define how I use it?
I am already happy when the hardware works the way it is supposed to do, I cannot claim what happens if you would use it another way - and anyway you wouldn't have such a PCB to play with...

> If the hardware doesn't change, then the DT should not change. If it does, it should be a bug in the DT being fixed

Sorry, firmware in FPGA changed, loaded at run time, the DT description is variable depending on the filesystem content...

> by the way, where do you expect the data to come from to get linked into your binary?

You write it from the same docs that you would write your DT from.

Device tree troubles

Posted Jul 29, 2013 16:56 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

>> by the way, where do you expect the data to come from to get linked into your binary?

>You write it from the same docs that you would write your DT from.

I was meaning, where does the system trying to boot find the data to link into your binary. you seem to be saying that the DTis no good because there's no place to store it, so where do you store this other data that needs to be linked?

Device tree troubles

Posted Jul 29, 2013 19:12 UTC (Mon) by etienne (guest, #25256) [Link]

What I am saying is that you should not have 3 files (uboot, kernel, device tree) and try to combine the last two at a time where you cannot even display an error message.
That is, combine the last two files at installation into the sdcard time, when you are running linux, windows or OSx to write that sdcard, when you can display error messages and check stuff.
In the PC world it would be at grub-install-kernel time.
And what I then mean is all this complex DT is just doing what the LD linker do, but a lot more slowly, with a lot more code, and with errors delayed to execution time when it would be better to just use LD.
TL,dr: no DT partition or file, generic kernel is partial link, specialised at installation time.

Device tree troubles

Posted Jul 31, 2013 6:45 UTC (Wed) by alison (subscriber, #63752) [Link]

>the alternative is that you will have to have a file included into the
> kernel build that tells the kernel about the system. At which point that
> kernel build cannot be used for any other system.

I know, let's call that a "board file"! Oh, wait . . .

Device tree troubles

Posted Sep 10, 2013 6:36 UTC (Tue) by glaesera (guest, #91429) [Link]

There seems to be a discrepancy between the market, i.e. what people like to buy (very often an ARM-tablet or -smartphone instead of a PC-architecture notebook) and the underlying technical foundations and their understanding and living inside the cummunities.
PC-developers are not spoiled, but the ARM-manufacturers are. They sold and sell very large quantities, but some time the market will be saturated.
ARM is going to catch up with the PC-architecture, but it will most probably not be able to completely rule out x86(-64).
It is going to be equally easy to set up a linux-installation on any ARM-device as it used to be on PCs, but the big hype will be mostly over then already.