NVIDIA and nouveau
The release of source code for NVIDIA graphics hardware was perhaps something of a surprise; at least at a quick glance, it seems like that could lead to an in-tree, officially supported driver. For many years, though, the nouveau project has been working on an upstream driver for NVIDIA hardware, so an obvious question is what happens with nouveau in light of the NVIDIA announcement. Kernel graphics maintainer Dave Airlie gave a talk at the 2022 Linux Plumbers Conference (LPC) to help shed some light on that subject.
NVIDIA
He began by giving a brief history of NVIDIA hardware, with a timeline that can be seen in his slides. The timeline was in part "cobbled together from Wikipedia" and is not completely accurate, he said, but shows just "how far back NVIDIA hardware stretches". While the timeline starts in 1999, things started getting interesting in 2006 with the NV50, he said. It introduced the per-context virtual memory addresses; that feature represented a major turning point for graphics hardware.
There is roughly a two-year cadence to the NVIDIA releases starting in 2010 with "Fermi" (GF1xx). Vulkan support was added in the "Kepler" (GK1xx) hardware in 2012. In 2014, "Maxwell" came in two versions (GM1xx and GM2xx); the latter, also known as "Maxwell 2", introduced signed firmware. That pace more or less continued with "Pascal" (GP1xx) in 2016, "Volta" (GV1xx) in 2017, "Turing" (TU1xx) in 2018, and "Ampere" (GA1xx) in 2020. Turing brought support for the GPU system processor (GSP); he explained the importance of that feature a bit later in the talk.
Starting with Maxwell 2, NVIDIA decided that firmware for its devices could not simply be loaded unsigned, for security and other reasons. So firmware needed to be signed by NVIDIA and loaded into the multiple processors on the device. This made life hard for the nouveau project because it required complicated boot sequences for poking multiple firmware images into the device in a specific order that was "very hard to get right".
NVIDIA and nouveau had worked out an arrangement where NVIDIA would provide signed firmware, but it was still difficult to get any of the hardware working. Even when all of the right things were done at boot time, the devices came up in their base configuration. The devices were powered-on and functioning, but "you can't make it reclock, you can't make it go faster". Manually choosing a performance level for NVIDIA devices is known as "reclocking". There was also no power-management functionality available to the driver. This was a watershed moment for nouveau, Airlie said, because it did not make sense to put a lot of effort into a driver for graphics hardware running in its slowest possible mode, while not being battery-friendly either.
The GSP is a RISC-V-based processor that was added to the GPU for the Turing and later hardware. The GPU already had "six or seven little processors on it", but the GSP is meant to be "the one to rule them all". The firmware file for the GSP is around 30 or 40MB; most of the earlier firmware blobs were on the order 256KB, so the GSP is a substantial increase in size. But it is a single firmware image for the device that initializes the rest of the processors. Effectively, NVIDIA moved much of its proprietary kernel driver into the GSP.
That all happened around the same time as the announcement of the open-source NVIDIA kernel drivers, he said. Those are based on a fork of the NVIDIA proprietary driver that only interfaces directly to the GSP; it turns out that there is nothing all that interesting in the API between the kernel and the GSP, so it could be released as open source. Since NVIDIA has customers who are interested in open-source drivers, it makes sense for the company to do so. However, the drivers do not look or act like the existing kernel graphics drivers so they are not able to go into the upstream kernel, Airlie said.
nouveau
That is the current state in the NVIDIA world, which made for a good lead-in to talk about nouveau. That project started in around 2007 to reverse-engineer NVIDIA GPUs to create Linux drivers. It supports hardware from NV04 (1999) through Ampere "in various states of disrepair".
But the project has stagnated some recently due to various factors. One big problem that a community open-source graphics project faces is that once someone gets good at working on it, that becomes known, and they get hired away to work on some other graphics hardware. There is really only one full-time nouveau developer, Ben Skeggs at Red Hat, working on the project.
Also, once the signed firmware came about, with its lack of reclocking and power-management features, it was disheartening for the project; there was no way that the open-source driver was ever going to be able to compete with the proprietary one. It was hard to justify putting in a lot of effort into nouveau. Beyond that, Skeggs spent a lot of time just trying to get the firmware provided by NVIDIA to load and run the hardware.
For the most part, the nouveau kernel driver is just for hardware enablement at this point. The firmware that NVIDIA provides is not the same as what is used by the proprietary driver, so it is not well-tested. Only NVIDIA can really debug problems with that firmware, so there have to be multiple round-trips with NVIDIA engineers. More recently, though, the project has been adding GSP support because that provides a high-level interface to things like reclocking, so the hope is that the nouveau kernel driver can use the standard NVIDIA GSP firmware and drive the hardware that way; "we will see".
OpenGL and Vulkan
There is a nouveau OpenGL driver in Mesa. He believes it has passed the OpenGL 4.5 conformance tests, but has never been submitted for certification. Up until recently, it had "horribly broken multithreading context support" so it worked for older single-threaded games and the like but not for programs like Firefox or modern games; that has been fixed recently, though. The driver has not seen a lot of optimization work, however, due to the lack of reclocking support for the hardware.
A Vulkan driver for nouveau was recently started by Jason Ekstrand, with help from Karol Herbst and Airlie. At the time of the talk, that was a bit of news, but things have progressed since that time. The driver is targeting Vulkan 1.0 for hardware from Kepler up through Ampere and is passing lots of the conformance tests at this point. But in order to finish the driver, and make it work the way they want it to, there is a need to add new user-space APIs to the kernel.
There are three features needed to get Vulkan really working, he said. The first is to split the physical memory allocations (for buffer objects) from the GPU virtual memory allocations. In nouveau, that's all done in one step, which is fine for OpenGL but does not work for Vulkan where more control over the mappings is required.
The second is that synchronization objects and ways to handle and work with them need to be added so that the scheduler can wait for existing GPU work to complete before sending new tasks. It is the way to do proper interleaving of GPU work, Airlie said. The final piece is a virtual-memory-handling interface that is called VM_BIND; it is something that is being looked at for the Intel driver and the amdgpu driver already has many of pieces of it. It is an API both for virtual memory and for command submission that is also geared toward the needs of Vulkan.
Those are all non-trivial projects, he said. Once the GSP support is working and reclocking can be done, these are the next steps for nouveau, but they are going to take some time. The Vulkan driver developers have already started looking at that effort, but there are somewhat circular dependencies that make it difficult to see how to do the work incrementally. It will be a lot of code to review, so getting it into the upstream kernel in a piece-wise fashion will be challenging.
Future
There are some upcoming problems that have not yet been faced, he said. A 30 or 40MB firmware image is rather large; normally, those are put into the initramfs. But putting multiple initramfs images into the boot partition may overrun the space available. The problem gets worse because there may be a need to ship multiple NVIDIA firmware images due to a lack of a stable firmware ABI. The nouveau project will have to pick and choose which firmware versions to support, but each will need to be available; he has wondered if there might be a way to delay firmware loading until after the real root filesystem is mounted, but has not really worked that out yet.
In the long run, it may not really make sense to pound the NVIDIA firmware API into the nouveau driver, which has its own ideas about how everything works. A new driver that leaves behind the existing nouveau legacy and only talks to the GSP using NVIDIA's API may be the right path instead. In addition, the ability to reclock the hardware and accelerate the GPU may allow creating a cross-platform compute stack, to replace the vendor-specific solutions (e.g. CUDA) that exist today. All of those solutions are on their own island, lacking any real developer community, but maybe that could be changed; "we've done it for Vulkan, we've done it for OpenGL, I don't see why we can't do it", though it will take a lot of time—and likely a lot of money.
An audience member asked about Vulkan Compute as a possibility, but Airlie said it was not geared toward the same kinds of problems as CUDA and others. It is better than OpenGL Compute, but is still a long way from what the real compute stacks provide. Ekstrand echoed that, noting that while there is desire to see Vulkan Compute handle more of the "scientific" computing use cases, it will never be a full-stack solution; at most Vulkan can provide the run-time piece, he said.
There was some discussion of the problem with the size of the GSP firmware and initramfs that Airlie had described, including several suggestions of ways to approach the problem. The YouTube video of the talk is available for those who are interested in that discussion or more of the details elsewhere in the talk.
[I would like to thank LWN subscribers for supporting my travel to Dublin for Linux Plumbers Conference.]
Index entries for this article | |
---|---|
Conference | Linux Plumbers Conference/2022 |
Posted Oct 6, 2022 2:58 UTC (Thu)
by dowdle (subscriber, #659)
[Link]
Posted Oct 6, 2022 8:31 UTC (Thu)
by marcH (subscriber, #57642)
[Link] (1 responses)
Discussed in 2016:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/354089/3#message-6133ed6092de59e5a21cb25df03bdf1278a724d1
Posted Oct 10, 2022 9:05 UTC (Mon)
by tiwai (subscriber, #39450)
[Link]
I guess the easiest solution for now is to re-use the existing UMH of the firmware loader; basically you just need to enable CONFIG_FW_LOADER_USER_HELPER (but turn off CONFIG_FW_LOADER_USER_HELPER_FALLBACK), and modify / create a new helper to call request_firmware*() with FW_OPT_USERHELPER flag to allow the fallback via sysfs interface if the target firmware isn't found in initrd.
Posted Oct 6, 2022 10:47 UTC (Thu)
by gb (subscriber, #58328)
[Link] (4 responses)
Posted Oct 6, 2022 11:30 UTC (Thu)
by eru (subscriber, #2753)
[Link] (3 responses)
Posted Oct 6, 2022 13:23 UTC (Thu)
by flussence (guest, #85566)
[Link] (2 responses)
I hope they don't end up part of the standard kernel firmware tarball. That's bloated enough as it is.
Posted Oct 7, 2022 1:17 UTC (Fri)
by gb (subscriber, #58328)
[Link] (1 responses)
Posted Oct 7, 2022 18:24 UTC (Fri)
by zdzichu (subscriber, #17118)
[Link]
Posted Oct 6, 2022 14:23 UTC (Thu)
by mcon147 (subscriber, #56569)
[Link] (11 responses)
Posted Oct 6, 2022 16:38 UTC (Thu)
by JoeBuck (subscriber, #2330)
[Link] (9 responses)
It would be possible, but I don't see why they'd want to do that. The firmware is a highly complex piece of software; it will have bugs; after a very short time no one should be running the firmware that was originally shipped with the device.
Please pardon me if I'm interpreting your question wrong, but if the idea here is to enable the FSF fiction that if we don't give the user any way to modify proprietary firmware, even to fix severe bugs, and ship devices that put the firmware in ROM or effectively-ROM (nonvolatile memory that the OS provides no way to write to), we can claim to be running an entirely free system, that idea isn't worth promoting.
Posted Oct 6, 2022 18:36 UTC (Thu)
by iabervon (subscriber, #722)
[Link]
Posted Oct 6, 2022 19:11 UTC (Thu)
by ncm (guest, #165)
[Link] (5 responses)
Posted Oct 11, 2022 6:58 UTC (Tue)
by marcH (subscriber, #57642)
[Link] (4 responses)
Exactly this. BTW this was discussed a lot in the Q&A session at the end of the presentation, the URL is above.
IMHO the key idea is to stop considering the GPU (and some others) as some "ancillary" device that should be fully initialized as early and quickly as possible. The CPU and GPU should instead be treated more like _peers_ in the "Distributed System on Chip", trying to boot at the same time with as few as possible early dependencies between each other.
There is clearly another, "full-blown" operating system in those 40 Megabytes; some Linux products are smaller than that!
So the GPU should have its own, basic "bootloader" that makes the screen just _usable_; the equivalent of UEFI on the main CPU. In fact you bet NVidia engineers have stuff like this internally _already_ because they need "bootloader" and minimal systems like this when they screw up the big image and it stops booting - exactly like when you fall back to UEFI and GRUB when you screw up the OS of the main CPU.
"NVidia must release option ROMs" was mentioned in the Q&A session.
Posted Oct 11, 2022 14:48 UTC (Tue)
by luto (subscriber, #39314)
[Link] (3 responses)
Posted Nov 6, 2022 4:02 UTC (Sun)
by marcH (subscriber, #57642)
[Link] (2 responses)
Posted Nov 6, 2022 10:22 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
Cheers,
Posted Nov 10, 2022 19:02 UTC (Thu)
by flussence (guest, #85566)
[Link]
(Suddenly the EFI Shell being designed the way it is makes a lot more sense to me…)
Posted Oct 7, 2022 6:37 UTC (Fri)
by himi (subscriber, #340)
[Link] (1 responses)
I suspect the point of the question was to allow bringing up the card with that (hopefully small) default firmware, and then later on load the big blob from storage somewhere other than the initramfs and complete the bring up. It seems like a sensible option, though I'm not sure NVidia would want to bother with the work required to support it.
Posted Oct 7, 2022 7:47 UTC (Fri)
by mjg59 (subscriber, #23239)
[Link]
Posted Oct 7, 2022 17:13 UTC (Fri)
by ju3Ceemi (subscriber, #102464)
[Link]
For CPU, you have multiple ways to push a firmware in it:
I cannot see why this is not the same for GPUs .. you can upgrade your "gpu bios" already
Posted Oct 6, 2022 16:49 UTC (Thu)
by rgb (subscriber, #57129)
[Link]
Posted Oct 7, 2022 16:29 UTC (Fri)
by wsy (subscriber, #121706)
[Link] (5 responses)
Posted Oct 7, 2022 22:36 UTC (Fri)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Oct 8, 2022 18:47 UTC (Sat)
by wsy (subscriber, #121706)
[Link] (1 responses)
Posted Oct 9, 2022 11:37 UTC (Sun)
by mathstuf (subscriber, #69389)
[Link]
Posted Oct 11, 2022 10:37 UTC (Tue)
by xnox (guest, #63320)
[Link] (1 responses)
Posted Oct 11, 2022 14:17 UTC (Tue)
by wsy (subscriber, #121706)
[Link]
It is possible their GPU goes the same route in the future.
* https://www.servethehome.com/a-quick-look-at-logging-into...
Posted Oct 13, 2022 13:30 UTC (Thu)
by roblucid (guest, #48964)
[Link]
NVIDIA and nouveau
NVIDIA and nouveau
https://lkml.iu.edu/hypermail/linux/kernel/1609.0/01530.html
NVIDIA and nouveau
Then you can set up user-space as you like to pass the firmware at any feasible moment with a classic mechanism via sysfs. Systems without an extra setup would keep working as long as a target firmware is put in initrd, too (just like now).
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
So instead of providing N full signed blobs it could provide 1 Big blob and N-1 delta blobs, and load two files into video card, and hardware deal with delta, compression and encryption.
So it should be safe and small in size.
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
Wol
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
- either via some bios update, which will permanently load the firmware
- or at runtime, from Linux
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau
NVIDIA and nouveau