The imminent stable-version apocalypse
For most of the existence of the kernel project, few developers within the project itself have maintained any given kernel release for more than a couple years or so, and maintenance releases were relatively rare. There were some exceptions; the 2.4 release happened at the beginning of 2001, and Willy Tarreau finally stopped maintaining it more than eleven years later. Even then, the final version was 2.4.37, though one could perhaps call it 2.4.48 after the final set of eleven small "fixup" releases. Releases for kernels maintained for the long term were relatively few and far apart.
In recent years, though, that situation has changed, with some older kernels receiving much more long-term-maintenance attention. Thus, February 3 saw the release of the 4.9.255 and 4.4.255 updates. Those kernels have received 18,765 and 16,986 patches, respectively, and there is no sign of things slowing down. The current posted plan is to maintain 4.9 through January 2023 and 4.4 through February 2022.
These kernel-release numbers are now a problem, as was pointed out by Jari Ruusu. There are a couple of macros defined within the kernel relating to version codes; these can be found in include/generated/uapi/linux/version.h in a built kernel:
#define LINUX_VERSION_CODE 330496 #define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
The first macro, LINUX_VERSION_CODE, is calculated in the top-level makefile; it is the result of:
(5 << 16) + (11 << 8) + 0
That number (which is 0x50b00) identifies this as a 5.11-rc kernel; it is the same result one gets from KERNEL_VERSION(5,11,0).
One does not have to look long to see that neither of these macros is going to generate the expected result once the minor version ("c" in the KERNEL_VERSION() macro) exceeds 255. Running that macro on a 4.9.255 kernel yields 0x409ff, but on 4.9.256 it will instead return 0x40a00 — which looks like 4.10.0. That might just cause some confusion in the user community.
This problem does not come as a complete surprise to the stable-kernel maintainers; Sasha Levin posted this patch in mid-January in an attempt to fix it. It changes both LINUX_VERSION_CODE and KERNEL_VERSION() to use 16 bits for the minor version, thus eliminating the overflow. This patch got into linux-next, but seems unlikely to stay there; as Jiri Slaby noted, these macros are used by user space and constitute a part of the kernel's ABI. He added that both the GNU C Library and the GCC compiler (the BPF code in particular) use the kernel version code in its current form and would not handle a change well. There are also many other places in the kernel that exchange these version codes with user space; see this media ioctl() command, for example. Changing the kernel's idea of how KERNEL_VERSION() works will break programs compiled with the older macro, which is not something that is allowed.
So what is to be done? As of this writing that has not yet been worked out, but there are a couple of options on the table:
- Ruusu's note pointing out the problem suggested that stable releases could start incrementing the EXTRAVERSION field instead; this is the field that normally contains strings like -rc7 (for mainline test releases), or a Git commit ID. The minor version would presumably remain at 255. This would avoid breaking ABI, but would also make it harder for user-space code to distinguish between stable releases after 255. It might also create minor trouble for distributors who are using that field to identify their own builds.
- Stable maintainer Greg Kroah-Hartman suggested that
he could "
leave it alone and just see what happens
". But, as Slaby pointed out, that will create the wrapping problem described above, which could confuse user space. If this is done, he said, it would be necessary to mask the minor version to eight bits, causing it to wrap back around to zero; whether that would cause confusion is another question. Version numbers are normally expected to increase monotonically.
The most likely outcome can be seen in the kernel's history, though. Once upon a time, mainline kernel releases had three significant numbers rather than two — 2.6.30, for example. In those days, the minor version field wasn't available for stable updates, so the EXTRAVERSION field was used instead. Looking at the 2.6.30.3 makefile, one sees:
VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 30 EXTRAVERSION = .3 NAME = Man-Eating Seals of Antiquity
That solution worked for years, so there should be no real reason why it wouldn't work now as well. Most likely SUBLEVEL would remain stuck at 255, with EXTRAVERSION indicating the real release number.
It is evidently Leon Trotsky who once said that "old age is the most unexpected of all things that can happen to a man". Perhaps similar forces are at play here; running out of bits is the most unexpected of things that can happen to a kernel developer. This version-number overflow could have been foreseen some time ago, and the date of its occurrence forecast with reasonable certainty. But now some sort of solution has to be found before the next stable-kernel release can be made. Happily, the problem should be easier to resolve than that of old age.
Update: Kroah-Hartman appears to have chosen the "do nothing" option
with the release of 4.9.256 and 4.4.256, both of which increment the version
number but make no other change. "I'll try to hold off on doing a
'real' 4.9.y release for a week to give
everyone a chance to test this out and get back to me. The pending patches in
the 4.9.y queue are pretty serious, so I am loath to wait longer than that,
consider yourself warned...
"
Update 2: In the end, it appears
that the clamping solution will be taken, with the minor number fixed at
255 going forward.
Index entries for this article | |
---|---|
Kernel | Development model/User-space ABI |
Kernel | Releases/Stable updates |
Posted Feb 5, 2021 15:29 UTC (Fri)
by leromarinvit (subscriber, #56850)
[Link] (2 responses)
As always, thanks for the excellent reporting, Jon!
Posted Feb 9, 2021 20:35 UTC (Tue)
by MarkFrankSharefkin (guest, #135746)
[Link] (1 responses)
Posted Feb 10, 2021 10:29 UTC (Wed)
by mbg (subscriber, #4940)
[Link]
Posted Feb 5, 2021 16:14 UTC (Fri)
by edeloget (subscriber, #88392)
[Link]
The main problem will be users of the KERNEL_VERSION() macro ; tests such as
#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0))
will break on 4.4.256, but such break will likely be seen (because this kind of code is used to test if feature from kernel 4.5 can be used).
The other version of the test
#if (LINUX_VERSION_CODE < KERNEL_VERSION(4,5,0))
will be silently compiled out even though it should not, and it's very likely that something will break.
Posted Feb 5, 2021 16:44 UTC (Fri)
by willy (subscriber, #9762)
[Link] (3 responses)
Code which needs to care about versions after 255 can check LINUX_VERSION_SUBLEVEL directly.
Posted Feb 5, 2021 17:52 UTC (Fri)
by mchehab (subscriber, #41156)
[Link] (1 responses)
That seems to be the best thing to be done.
Without that, media applications will break, as they several of them rely at the Kernel version in order to enable some features:
drivers/media/cec/core/cec-api.c: caps.version = LINUX_VERSION_CODE;
Posted Feb 5, 2021 20:24 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Feb 5, 2021 19:43 UTC (Fri)
by tglx (subscriber, #31301)
[Link]
Yes. And the commit message of x.x.x.255 wants to be:
This kernel has been finally stapled to death. Move on.
Posted Feb 5, 2021 18:05 UTC (Fri)
by jemarch (subscriber, #116773)
[Link]
The BPF backend in GCC 10 accepts a -mkernel option where the user can specify a string identifying a kernel version, from 4.0 to 4.20 and from 5.0 to 5.2. This information is then used internally.
However, after getting feedback from the kernel community we decided that -mkernel wasn't useful in practice, and consequently GCC 11 ignores it.
An eventual change in the encoding of LINUX_VERSION_CODE wouldn't impact GCC at all.
Salud!
Posted Feb 5, 2021 18:47 UTC (Fri)
by flussence (guest, #85566)
[Link] (3 responses)
I think userspace code might have a few valid reasons to check patchlevel, but anything that cares about specific sublevels in what's supposed to be a stable kernel series is probably doing something horribly wrong - and breaking the former for the latter is the wrong tradeoff. Any program that needs to know about numbers above 255 is getting frequent enough updates that it can be taught to read them from something else.
Posted Feb 5, 2021 22:05 UTC (Fri)
by Nahor (subscriber, #51583)
[Link] (2 responses)
If there is a new kernel, it means something changed. Why would userspace not be interested? That change might mean that a workaround for some issue is no longer necessary, or it might mean an updated driver that doesn't need to be pulled from out-of-tree anymore, ...
Posted Feb 6, 2021 13:25 UTC (Sat)
by mathstuf (subscriber, #69389)
[Link]
Posted Feb 12, 2021 19:01 UTC (Fri)
by flussence (guest, #85566)
[Link]
On the other hand, it's entirely reasonable for code that hasn't been touched in *years* to have different paths for kernel 4.9.x and 4.10.0, and that's the sort of program that'll break when the number wraps around and may be hard or impossible to fix when it does.
Posted Feb 5, 2021 19:30 UTC (Fri)
by amarao (subscriber, #87073)
[Link] (8 responses)
p.s. how much would you pay to have 4 billion VLANs instead of 4k?
Posted Feb 5, 2021 19:43 UTC (Fri)
by edeloget (subscriber, #88392)
[Link] (1 responses)
Not much :) as I can use S-VLAN and C-VLAN, and that would open a fantastic world of "add as many VLAN as you want" at the price of adding yet another 4 bytes L2 header (S-VLAN headers can be chained, we you can have
DSTMAC[6] SRCMAC[6] SVLAN[4] SVLAN[4] CVLAN[4] ETHTYPE[2] ...
Which gives you 68 billion VLANs (and you can add even more SVLAN levels).
Granted, you may have to use recent network appliances, but Linux at least has supported 802.1ad for years.
And yet...
> Humanity is always in severe deficiency of integers.
I can't agree more :)
Posted Feb 6, 2021 18:32 UTC (Sat)
by champtar (subscriber, #128673)
[Link]
Posted Feb 5, 2021 22:27 UTC (Fri)
by Sesse (subscriber, #53779)
[Link]
Posted Feb 5, 2021 23:33 UTC (Fri)
by jengelh (subscriber, #33263)
[Link]
ICQ account numbers were assigned in monotonically increasing fashion, I believe.
(The OSM database, which I have taken care of for a handful of years, yields this statistic piece about phone numbers for my area code: /^[0-9]/: 4807 POIs, /^9/: 450, /^99/: 242, /^999/: 133.)
Posted Feb 7, 2021 1:29 UTC (Sun)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
/s
Posted Feb 8, 2021 10:23 UTC (Mon)
by khim (subscriber, #9252)
[Link] (1 responses)
It doesn't matter when IPv6 was developed. Only matters when IPv4 pool was exhausted. That happened about one year ago. The fact that anyone started pushing IPv6 before that moment is a miracle in itself. This 15-years old article explains that phenomenon well… and it looks as if Linux kernel follows the same trajectory: people are only fixing things when they break. Not before.
Posted Feb 8, 2021 12:11 UTC (Mon)
by tzafrir (subscriber, #11501)
[Link]
Posted Feb 7, 2021 6:29 UTC (Sun)
by pr1268 (subscriber, #24648)
[Link]
This reminds me of how high the demand is for low-numbered "classic" Delaware license plates — the lower the number, the higher the auction price.
Posted Feb 5, 2021 19:30 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Feb 5, 2021 23:10 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (1 responses)
*eyes some fixes for ATM drivers and m68k*
Posted Feb 6, 2021 1:01 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 5, 2021 20:45 UTC (Fri)
by zyga (subscriber, #81533)
[Link]
Posted Feb 6, 2021 13:39 UTC (Sat)
by mss (subscriber, #138799)
[Link]
This split-version-personality would of course need to be introduced to the upstream, too, since things like VIDIOC_QUERYCAP need to be changed to use KERNEL_VERSION_CAPPED.
Posted Feb 7, 2021 11:12 UTC (Sun)
by meerdan (subscriber, #119439)
[Link] (1 responses)
To do nothing might work out for the moment, but it will come back and bite Greg in the future. Especially if he keeps doing this for new stable branches.
Posted Feb 7, 2021 13:17 UTC (Sun)
by gregkh (subscriber, #8)
[Link]
Posted Feb 18, 2021 1:06 UTC (Thu)
by opalmirror (subscriber, #23465)
[Link] (2 responses)
Whatever has happened before, will happen again.
So say we all.
Posted Feb 18, 2021 6:36 UTC (Thu)
by jem (subscriber, #24231)
[Link] (1 responses)
Somebody may have said "More than 4 GB will not be needed for a long time, and it is not economical to support it now" when 32-bit PCs emerged in the 1980s.
Posted Feb 18, 2021 22:32 UTC (Thu)
by opalmirror (subscriber, #23465)
[Link]
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
..
#endif
...
#endif
The imminent stable-version apocalypse
The imminent stable-version apocalypse
drivers/media/mc/mc-device.c: info->media_version = LINUX_VERSION_CODE;
drivers/media/v4l2-core/v4l2-ioctl.c: cap->version = LINUX_VERSION_CODE;
drivers/media/v4l2-core/v4l2-subdev.c: cap->version = LINUX_VERSION_CODE;
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
However, the phone numbers seem to be allocated as a radix tree here, so they are prone to be a lot more unbalanced than ICQ#, or kernel versions for that matter.
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The same goes for short numbered ICQ number
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
For userspace-visible headers there would be #define KERNEL_VERSION KERNEL_VERSION_CAPPED, but the kernel internally would use KERNEL_VERSION_16.
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
The imminent stable-version apocalypse
Fair comment. I guess nobody really said that no one would ever need to use more then 4GB, it was genuinely just a technology goal point for the entire industry.
The imminent stable-version apocalypse