Better handling of integer wraparound in the kernel
Cook starts by clarifying the definitions of two related terms:
- Overflow happens when a signed or pointer value exceeds the range of the variable into which it is stored.
- Wraparound happens, instead, when an unsigned integer value exceeds the range that its underlying storage can represent.
This distinction is important. Both overflow and wraparound can create surprises for a developer who is not expecting that situation. But overflow is considered to be undefined behavior in C, while wraparound is defined. As a result, overflow brings the possibility of a different kind of surprise: since it is undefined behavior, compilers can feel free to delete code that handles overflow or apply other unwelcome optimizations. To avoid this outcome, the kernel is built with the -fno-strict-overflow option, which essentially turns (undefined) overflow conditions into (defined) wraparound conditions.
So, in a strict sense, overflows do not happen in the kernel, but wraparounds do. If a wraparound is intended, as is often the case in the kernel (see ip_idents_reserve(), for example), then all is fine. If the developer is not expecting wraparound, though, the results will not be so good. As a result, there is value in using tooling to point out cases where wraparound may happen — but only the cases where it is not intended. A wraparound detector that creates a lot of false-positive noise will not be welcomed by developers.
In the past, the tooling, in the form of the undefined behavior sanitizer (UBSAN) and the GCC -fsanitize=undefined option, has indeed generated false-positive warnings. As a result, this checking was disabled for the 5.12 release in 2021 and has remained that way ever since. Cook is now trying to re-enable UBSAN's wraparound checking and make it useful; the result was an 82-part patch set making changes all over the kernel.
The key to making this checker useful is to prevent it from issuing warnings in cases where wraparound is intended. One way to do that is to explicitly annotate functions (generally of the small, inline variety) that are expected to perform operations that might wrap around and that handle that situation properly. The __signed_wrap and __unsigned_wrap annotations have been duly added for this purpose; they work by disabling the checking of potential wraparound conditions in the marked function.
The most common place where intentional wraparound is seen, though, is in code that is intended to avoid just that behavior. Consider this code in the implementation of the remap_file_pages() system call:
/* Does pgoff wrap? */ if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret;
Normally, the sum of two unsigned values will be greater than (or equal to) both of those values. Should the operation wrap around, though, the resulting value will be less than either of the addends. As a result, wraparound can be reliably detected with a test like the above. It is worth noting, though, that this test detects wraparound by causing it to happen; wraparound is an expected result that is properly handled.
To a naive wraparound detector, though, that code looks like just the sort of thing it is supposed to issue warnings about. The resulting noise makes such a detector useless in general, so something needs to be done. In this case, Cook adds a pair of macros to explicitly annotate this type of code:
add_would_overflow(a, b) add_wrap(a, b)
The first returns a boolean value indicating whether the sum of the two addends would wrap around, while the second returns that sum, which may have wrapped around. These macros are built on the kernel's existing check_add_overflow() macro which, in turn, uses the compiler's __builtin_add_overflow() intrinsic function. Using these, the above remap_file_pages() test is rewritten as:
/* Does pgoff wrap? */ if (add_would_overflow(pgoff, (size >> PAGE_SHIFT))) return ret;
This code now clearly does not risk an unwanted wraparound, and so no longer triggers a warning. The patch set rewrites a large number of these tests throughout the kernel. Along the way, Cook also had to enhance check_add_overflow() to handle pointer arithmetic so that pointer additions can be easily checked as well.
With all of this work in place, it is possible to turn on wraparound checking in UBSAN again. Eventually, the warnings generated should be accurate enough that it can be used to detect code that is not written with wraparound in mind. First, though, this work has to find its way into the mainline. In the best of times, a series that changes over 100 files across the kernel tree is going to be challenging to merge, though Cook has gotten fairly good at that task.
A more difficult challenge may be the opposition
expressed by Linus Torvalds. He complained that the changelogs do not
properly describe the changes that are being made, that the new annotations
cause the compiler to generate less-efficient code, and that the tooling
should recognize wraparound tests in the above form without the need for
explicit annotation: "if there's some unsigned wraparound checker that
doesn't understand this traditional way of doing overflow checking, that
piece of crap needs fixing
". He added
some conditions for merging these changes.
Cook answered that
he would rewrite the changelogs, which was one of the things Torvalds
demanded. Another one of those demands — "fix the so-called
'sanitizer'
" — might prove to be a bit more challenging, since it will
require work on the compiler side. The advantage of such a fix is clear;
it would remove the need for hundreds of explicit annotations in the
kernel. But that would come at the cost of delaying this work and dealing
with the bugs that enter the kernel in the meantime.
The history of the hardening work in the kernel suggests that these little
obstacles will indeed be overcome in time and that the kernel will
eventually be free of wraparound bugs (or close to that goal, anyway). Of
course, as Kent Overstreet took pains to point
out, this work would not be necessary if the kernel would just take the
minor step of switching to Rust. Cook answered
that any such change is not happening soon, so he will continue his work of
"removing as many C foot-guns as possible
". As this work is wrapped
up, the result should be a more stable and secure kernel for all of us.
Index entries for this article | |
---|---|
Kernel | Security/Kernel hardening |
Security | Linux kernel/Hardening |
Posted Jan 26, 2024 17:48 UTC (Fri)
by adobriyan (subscriber, #30858)
[Link] (4 responses)
__builtin_add_overflow_p(a, b, (typeof(0 ? a : b))0)
?
This whole overflow.h file feels like a loss in readability. Before you could read
s = kmalloc(sizeof(struct S) + n * sizeof(u32), GFP_KERNEL);
and immediately see that it is struct S is being allocated and has arbitrary length array at the end without even seeing definition.
And of course people find a way to around SIZE_MAX:
net/wireless/scan.c:
request = kzalloc(struct_size(request, channels, n_channels) +
----------------------
Everything inside [[]] is automatically checked for overflows.
Why I am not a programming language designer?
Posted Jan 27, 2024 16:17 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link]
In "overflow: Expand check_add_overflow() for pointer addition"? Where exactly?
> __builtin_add_overflow_p(a, b, (typeof(0 ? a : b))0)
clang does not seem to have that builtin.
Why do you use `typeof(0 ? a : b)` instead of `typeof(a + b)` ?
Posted Jan 31, 2024 14:41 UTC (Wed)
by error27 (subscriber, #8346)
[Link]
net/wireless/scan.c:882 cfg80211_scan_6ghz() warn: using integer overflow function 'size_add()' for math
I had 110 warnings on my test on linux-next last night. The problem I ran into is that people like the struct_size() macro and use it for readability even when they aren't concerned about integer overflows. So a lot of the warnings are false positives in the sense that the code math won't overflow. (It could also be that people didn't choose struct_size() for readability, but it was part of a mass conversion to struct_size()).
Posted Feb 11, 2024 18:18 UTC (Sun)
by Hi-Angel (guest, #110915)
[Link] (1 responses)
That's not really fair regarding C. C is an ancient language, which was designed when there was very little understanding in what's good or bad. For example, it is these days that we know that 90% of values inside a function are immutable, so everything should be constant by default with "mutability" being an opt-in. So now that we know that, it is implemented in Rust.
However, if you were a programmer working on this in ≈1989, you wouldn't know about any of this. I think it's fair to presume that C was designed to the best of the knowledge of its time. It's just we know more than the people living back then.
Posted Feb 11, 2024 18:35 UTC (Sun)
by farnz (subscriber, #17727)
[Link]
Go back a bit from 1989 - C's development starts in 1972, and the first K&R book is 1978. But, as with most languages, C was not cutting-edge for its day; there's nothing in C that wasn't well-understood for the decade or so before development started, and (as you'd expect) some of C's features are things that PL research knew to be bad by 1978.
This is not exactly surprising; Rust doesn't take much, if anything, from post 1990s research, despite being a 2015 language release whose development started in 2006. At least in part, this happens because most people have seen and agree on the status of older work; newer work relies on people first noticing it, and then pushing it to their development community.
Posted Jan 26, 2024 19:32 UTC (Fri)
by iabervon (subscriber, #722)
[Link] (1 responses)
The program would still have to duplicate the expression, but at least it would be easy to have a warning if they don't match.
Posted Jan 28, 2024 10:18 UTC (Sun)
by khim (subscriber, #9252)
[Link]
There's already a perfectly usable builtin that does exactly that. Adding new operator would be useful only if it would be shown that built-in is not enough.
Posted Jan 26, 2024 22:01 UTC (Fri)
by rywang014 (subscriber, #167182)
[Link] (3 responses)
Posted Jan 27, 2024 4:32 UTC (Sat)
by magfr (subscriber, #16052)
[Link] (2 responses)
Posted Feb 7, 2024 8:19 UTC (Wed)
by milesrout (subscriber, #126894)
[Link]
Normal labels don't have implicit breaks before them. Why would case labels?
Posted Feb 8, 2024 19:51 UTC (Thu)
by mosfet_ (guest, #134203)
[Link]
case 0 - implicit fallthrough can be assumed for multiple empty case statements
Posted Jan 27, 2024 0:21 UTC (Sat)
by ikm (guest, #493)
[Link] (11 responses)
I never understood why people keep doing those intentional overflows in such checks, when the check can simply be rewritten without any overflow at all, in which case any overflow semantics (be it UB or wrapping) simply would not matter.
In this case what we intend to check for is essentially this:
> if (pgoff + (size >> PAGE_SHIFT) > SIZE_MAX)
Of course, that would only have worked if the left-hand expression wouldn't overflow, but we can rewrite it by moving the `(size >> PAGE_SHIFT)` part of the left-hand expression to the right-hand expression, given that it's never larger than the right-hand expression, using the simple math rule `a + b = c <=> a = c - b`:
> if (pgoff > SIZE_MAX - (size >> PAGE_SHIFT))
This check will never overflow (nor underflow, assuming `(size >> PAGE_SHIFT)` has the maximum of `SIZE_MAX`), will compile no matter the compiler options, and won't trigger any sanitizers.
There's a couple of things to mention:
* Sure, it might be harder to read for some people, but a comment could be left suggesting moving part of the rhs back to the lhs to understand the intention. And honestly, I think it is less cryptic than the original, at least from the mathematical point of view;
* Here we assume `pgoff` and `size` are `size_t`, an assumption not required in the original check. If we're unwilling to have such an assumption, we could define a macro to get the right maximum value depending on the expression, or even add an all-encompassing macro to check for an overflow when two expressions are being added. In any case, I think simply avoiding an overflow in the first place is a much-much cleaner way to go.
Posted Jan 27, 2024 1:02 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (6 responses)
Posted Jan 28, 2024 10:24 UTC (Sun)
by khim (subscriber, #9252)
[Link] (5 responses)
Why would you need to do that? Minimum GCC version supported by kernel is GCC 5.1, all the required functions are already in that version.
Posted Jan 29, 2024 2:02 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
Posted Jan 29, 2024 9:18 UTC (Mon)
by khim (subscriber, #9252)
[Link] (3 responses)
They threatened to drop support for clang if clang wouldn't implement some obscure GCC feature and yet wouldn't use some easily implementable (and actually implemented) function? Hard to believe, sorry.
Posted Jan 29, 2024 20:13 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
And you see folks, this is why I'm so reluctant to comment on LWN (or any other technical forum). I throw in a parenthetical to cover a case I'm not sure about, and get grilled about it in the replies, even though it was utterly tangential to my actual point.
Posted Jan 30, 2024 8:49 UTC (Tue)
by sdalley (subscriber, #18550)
[Link]
Posted Jan 30, 2024 11:22 UTC (Tue)
by Wol (subscriber, #4433)
[Link]
Just do what I do - get to know people by their nym. Plenty of people here think I'm brilliant - plenty also think I'm an idiot! I've had a lot of people tell me I'm "the voice of reason". I chose Wol for other reasons, but as the Owl who's not as clever as he thinks he is, he seems rather appropriate. Personally, I think I'm a damn good high-level analyst, I tend to struggle with the details but I can spot design flaws a mile off!
And then - it had to be Khim, didn't it ... personally, I respect him, our views are very similar about the *societal* effects around us. Him more than me, but I think we both get a bit dogmatic - and then sparks fly - Sorry Jon!
But no, this is a society like anywhere else. Some people keep their heads down, some are flamboyant, some just like to sit in the corner dispensing wisdom. That last sounds like you. Just try to learn who is not worth talking to, and don't respond to them. And then they might come up with an absolute gem ...
Cheers,
Posted Jan 27, 2024 5:36 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (3 responses)
Using SIZE_MAX is slightly less efficient (unless compiler recognizes this idiom too and moves operands back :-).
vsnprintf() does the following:
char* str = buf;
and it makes sense because you have to calculate "end" anyway.
For unsigned types, overflowing can't break the computer, so it simpler and faster in the assembly to just do it and compare the results.
Posted Jan 28, 2024 10:57 UTC (Sun)
by donald.buczek (subscriber, #112892)
[Link] (2 responses)
It does. At least, gcc and clang produce the same code for `a + b < a` and `a > UINT_MAX - b` starting with `-O1`. On x86_64 they just add a and b and look or carry in both cases.
It can be seen as a disadvantage of the second variant that it only works for unsigned types. It doesn't work analogously for signed types, because `INT_MAX - b` underflows for `b == INT_MIN`.
Posted Jan 29, 2024 2:09 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
But don't actually do that. Just use the intrinsic.
Posted Jan 29, 2024 6:29 UTC (Mon)
by donald.buczek (subscriber, #112892)
[Link]
Correct. Thanks for pointing that out!
> But don't actually do that. Just use the intrinsic.
Agreed.
Posted Jan 27, 2024 2:31 UTC (Sat)
by tialaramex (subscriber, #21167)
[Link] (31 responses)
Posted Jan 27, 2024 4:00 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Nowadays we have the "abstract machine" and such affordances, so it's no longer seen as surprising that + might compile into something other than an add instruction. So you could also say this is K&R's fault for designing C too close to the metal. But hey, hindsight is 20/20.
Posted Jan 27, 2024 5:49 UTC (Sat)
by willy (subscriber, #9762)
[Link]
So I'd blame the CDC-6600 although you can blame the PDP-1 or LINC if you're determined to make DEC the bad guy.
Posted Jan 27, 2024 5:45 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (12 responses)
Wrapping signed integers is a funny thing by itself.
I'd argue that Algebraic Types Party doesn't do _that_ better because overflowing/saturating is property of an operation not a type.
In a perfect world Rust would invent new syntax for "unsafe" additions/multiplications. Cryptographers would use it and everyone else would use regular + which traps.
Posted Jan 27, 2024 8:30 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (11 responses)
Rust already did that. They're simple method calls on the primitive integer types (e.g. wrapping_add, saturating_sub, checked_div, etc.). Wrapping<T> etc. types are for cases where you want to override what + does (because writing wrapping_add all the time is obnoxious). If you want to manually spell out the operation every time, you can do that.
Posted Jan 27, 2024 9:10 UTC (Sat)
by adobriyan (subscriber, #30858)
[Link] (10 responses)
I'm thinking more about "a [+] b" for unsafe addition , etc
Posted Jan 27, 2024 15:36 UTC (Sat)
by Paf (subscriber, #91811)
[Link]
Posted Jan 27, 2024 22:50 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (8 responses)
* carrying_add(): A helper method for implementing bignum addition. There's also borrowing_sub() for subtraction. Probably not useful for general-purpose addition.
You could, hypothetically, have a menagerie of different symbols for all of these different use cases, but it's going to look like Perl or APL very quickly. You could, I suppose, pick one of them as the "best" overflow mode and just give that a symbol, but I guarantee you it won't be unchecked_add() as your comment seems to suggest (they are never adding a binary operator that can only be used in unsafe blocks). The more flexible option is picking which mode you "usually" want to use, and using the Wrapping<T> etc. types for that.
One thing that does bother me is the fact that overflowing_add() returns a tuple instead of some kind of ADT wrapper. That would force you to explicitly destructure it and handle both the wrapping and non-wrapping cases. With a tuple, you can forget to check the bool in some code paths. It's not really a disaster because you probably won't have overly elaborate code surrounding the callsite, but it's still an opportunity for things to go wrong.
Posted Jan 28, 2024 8:13 UTC (Sun)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted Jan 28, 2024 10:35 UTC (Sun)
by khim (subscriber, #9252)
[Link] (5 responses)
Useless, pointless and not at all helpful in real code. That's what Wuffs does. Turning that into general-purpose language with pervasive dependent typing is surprisingly hard. Maybe someone would manage to do that 10 or 20 years down the road. No, you can not. The only thing you may achieve with templates are some statically-defined checks and these are not much useful for real programs with dynamically allocated externally-specified objects. And kernel is full of these. You may also reject C++ entirely and built entirely separate language where every type would be your own template type and everything is done using these, but at this point you are firmly in the “pseudo-language written in C++ templates” and it's better to just create a separate language than pretending you are still dealing with C++.
Posted Jan 28, 2024 11:26 UTC (Sun)
by epa (subscriber, #39769)
[Link] (4 responses)
Useful in practice? Perhaps not much. My concern was more about eliminating logic errors (and making explicit the places overflow can occur), rather than memory safety. The common style nowadays is to avoid arbitrary limits. Users would not be happy if grep had a maximum line length of 2000 characters. But in the real world you can usually put a bound on things: no aeroplane carries more than ten thousand passengers and no person is aged over two hundred. Ada does not have bounded integers purely out of design by committee. In the end your integer types will have a bound, like it or not, and forcing the programmer to consider it explicitly can be helpful.
So yes, I do agree with your comment. You can’t realistically retrofit bounded integers into C++ given the amount of existing code; and with everything dynamically allocated (rather than static fixed size buffers) they are not that useful for memory safety. Carrying a full proof of allowed bounds alongside each variable requires more cleverness than compilers have (although even a system that required numerous “trust me” annotations might have its uses). I was blue-skying about my ideal language rather than proposing a practical change to an existing language or codebase.
Posted Jan 28, 2024 12:01 UTC (Sun)
by khim (subscriber, #9252)
[Link] (3 responses)
No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow. Maybe embedded may work with precomputed limits, but every time I saw Beyond certain program complexity using arbitrary limits is not useful and below that limit using them is not too helpful. That's why I said such types are useless and pointless: where they are useful (small programs, limited scale) they are pointless, where they could be beneficial (large programs, complicated logic, data comes from outside) they are not useful. And then someone tries to use you program for a cruise ship or tries to enter data about three hundreds years Juridical person and everything explodes. Thanks, but no, thanks. It does have them because ALGOL and Pascal have them. But they are not much useful in any of these languages. Perhaps they were useful when they were invented and when programs were written once, run once and then discarded. But today we are not writing programs in such fashion and arbitrary static limits cause more harm than good. And I was thinking about practical applicability instead. I'm pretty sure with enough automation proof-carrying code may be pretty usable, but for now are stuck with Rust and that was right choice: lifetime tracking causes significantly more grief than out-of-bounds access. Perhaps after Rust would become the “new C” we may start thinking about dependent types. They need to have lifetime tracking as their basis, anyway, or else they wouldn't be much useful in practice, thus Rust made the right choice.
Posted Jan 28, 2024 12:15 UTC (Sun)
by epa (subscriber, #39769)
[Link]
Airline passenger management is a kind of middle ground between these two. It's not in itself safety-critical and doesn't need to run on tiny systems. But then, it does need to interact with parts of the real world that have fixed limits. Perhaps the passenger number is at most three digits long, or the dot-matrix printer only has 80 columns, or there is a database system with fixed storage size. In those cases I would prefer to be explicit about the bounds of a number rather than write code that purports to work with any number but in fact does have an upper bound somewhere, just nobody knows quite what it is.
Posted Jan 28, 2024 18:00 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
> No, you couldn't. Even if limit sounds sane today it would be, most definitely, be too small tomorrow.
Stop being an arrogant idiot!
Okay, I can't think of an example, and I guess you haven't even bothered to look, but I'm sure other people will be able to find examples where a certain positive big number indicates an error. Certainly I'm sure there are examples where the mere EXISTENCE of a negative value is an error (in other words 0 is an absolute lower bound).
(Actually, as a chemist, I've just thought of a whole bunch of examples. s is either 1 or 2 (can be empty aka 0). Likewise, p is 1 to 6. d is 1 to 10. f is 1 to 14. ANY OTHER VALUE IS AN ERROR.) To have the compiler make sure I can't screw up would be a wise choice.
And please, WHY ON EARTH would you want to store an entry about a company into a genealogical database? While it's possible it'll change (extremely unlikely), any value for age outside 0 - 126 is pretty much impossible. In fact, surely you DO want to limit that value, precisely in order to trigger an error if someone is stupid enough to try to enter a judicial person into the database!
Cheers,
Posted Jan 28, 2024 22:23 UTC (Sun)
by corbet (editor, #1)
[Link]
Posted Jan 28, 2024 10:53 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
(I'm sure you know this but) note that saturation occurs at _both_ ends of the range of an integer type, if you (-100i8).saturating_add(-100) that's the 8 bit signed integer -128 aka i8::MIN not i8::MAX
Posted Jan 27, 2024 21:14 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (10 responses)
But they magically have different behavior depending on the whether you compile in debug or release mode, because the "debug" and "release" profiles define `overflow-checks` differently. This is surprising, too, and I'm not sure, if that was a good design choice.
Posted Jan 28, 2024 10:35 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (5 responses)
Posted Feb 7, 2024 8:24 UTC (Wed)
by milesrout (subscriber, #126894)
[Link] (4 responses)
"You can only shoot yourself in the foot if you hold it wrong"
"Only incorrect programs have this problem" is the exact issue Rust is meant to prevent. What's the point of Rust if it doesn't prevent one of the most common sources of vulnerabilities?
Posted Feb 7, 2024 10:48 UTC (Wed)
by atnot (subscriber, #124910)
[Link]
> What's the point of Rust if it doesn't prevent one of the most common sources of vulnerabilities?
The main reason overflows lead to vulnerabilities is by breaking bounds checks. Since bounds checks are still applied by the compiler in a safe after whatever math you did, there's not really any way to turn it into a memory vulnerability. Unless you're manually writing bound checks in unsafe Rust, which is honestly pretty rare.
Posted Feb 7, 2024 10:56 UTC (Wed)
by farnz (subscriber, #17727)
[Link] (2 responses)
The different fully-defined behaviours are, in practice, not a common source of vulnerabilities; by default, in production, you get wrapping (2s complement wrapping if signed), in development you get panics.
This differs from the issue in C, where overflow is a common source of vulnerabilities since it's not defined for signed integers, and thus the compiler is allowed to surprise a developer.
Posted Feb 7, 2024 11:36 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
Okay, okay, that's being optimistic, but that is the likely course of events ...
Cheers,
Posted Feb 7, 2024 12:07 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
That's one component of it; the other is that if it does wrap, you're not going to be surprised by the resulting behaviour, whereas in C, you can get weirdness where overflowing signed arithmetic does unexpected things. Taking the following code as an example:
In standard C semantics, because arithmetic overflow is undefined, it is legal for all four comparisons to evaluate to true, and thus for all four printfs to print. In Rust semantics, because of the wraparound rule. c will be equal to (the equivalent of) INT_MAX - 4, and thus only the b < c condition is true.
This, in turn, means that you're less likely to be surprised by the result in Rust, since it's at least internally consistent, even if it's not what you intended. And thus, if you use the result of the arithmetic operation to do something like indexing an array, instead of the compiler being able to go "well, clearly c should be zero here, so I can elide the bounds check", the compiler does the bounds check and panics for a different reason (index out of bounds). You've now got an "impossible" error, and can debug it.
Note that this relies on Rust's memory safety guarantees around indexing; if you used pointer arithmetic with c instead, you could get an out-of-bounds read, and hence have trouble. The only thing that helps here is that pointer dereferencing is unsafe, and hence if you get unexpected SIGSEGVs or similar, that's one of the first places you'll look.
Posted Jan 29, 2024 2:18 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
Are there situations where data integrity might dictate failing rather than producing an incorrect value? Of course, but those are properly handled with checked_add() and friends, not with crashing the entire service. If there is any input that can cause your service to crash, it is an outage waiting to happen.
Posted Jan 29, 2024 7:23 UTC (Mon)
by donald.buczek (subscriber, #112892)
[Link] (1 responses)
Considering your example, isn't there a concern that your internal or external user might have other requirements and would prefer no data over wrong data? Overlooking unexpected overflows might result in serious incidents like Cloudbleed [1]. As a customer, I'd rather see a temporary service disruption than having to learn that my private data has been silently exposed and spilled into search engine caches.
[1]: https://blog.cloudflare.com/quantifying-the-impact-of-clo...
Posted Jan 29, 2024 9:25 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link]
This is fine if the program is running on the user's computer. If it is not, then a crash will affect everyone whose requests are being processed by that machine, and in the case of cascade failure, fallout will be wider still. There may be situations where failing rather than producing wrong data is preferred, but it is the responsibility of the programmer to understand that requirement and use checked_add etc. instead of regular arithmetic operators. This is no different to any other functional requirement of a software system - if you write the code wrong, it will not work.
> Overlooking unexpected overflows might result in serious incidents like Cloudbleed [1].
Security is hard. I'm not going to solve the general problem of malicious inputs in an LWN comment, but in general, defensive programming is necessary (not sufficient) to deal with problems of this nature. Most languages do not provide integers that always crash on overflow, so handling overflow correctly is something that serious programmers will find themselves having to deal with on a regular basis regardless. Rust provides the necessary methods for doing this front and center, which is more than you can say for most languages (C is only standardizing the equivalent functions in C23!). If you want stronger guarantees than that, I would suggest rewriting the parser (and perhaps a small part of the parser-adjacent application logic) in wuffs.
Besides, if you allow malicious inputs to cause a crash, you make yourself far more vulnerable to denial of service attacks, which are also a security concern (albeit a less serious one).
Posted Jan 29, 2024 11:33 UTC (Mon)
by danielthompson (subscriber, #97243)
[Link]
I think here you are describing a choice of *defaults* for overflow-checks rather than a design choice.
In other words a developer who doesn't care can leave the defaults as they are. If a developer did, for example, wanted their panic handler run for any overflow then they can change the defaults for the release build. Cargo allows them to change it both for their own code and, with a wildcard override, for any crates they depend on!
Posted Feb 5, 2024 16:34 UTC (Mon)
by plugwash (subscriber, #29694)
[Link] (1 responses)
Posted Feb 5, 2024 17:00 UTC (Mon)
by farnz (subscriber, #17727)
[Link]
Well, configurable different behaviour with different defaults in debug and release mode - you can panic on overflow in release, and you can wrap on overflow in debug. Plus there's the types like Wrapping which fully define it as wrapping in both modes, plus functions like checked_add if you want to change behaviour on overflow.
That said, this is a potential footgun if you're unaware that overflow is potentially problematic; the reason the default is panic in debug builds is to increase the chances of you noticing that you depend on overflow behaviour, and ensuring that it doesn't happen.
Posted Feb 13, 2024 19:54 UTC (Tue)
by DanilaBerezin (guest, #168271)
[Link] (2 responses)
Posted Feb 13, 2024 20:11 UTC (Tue)
by mb (subscriber, #50428)
[Link] (1 responses)
Posted Feb 13, 2024 22:18 UTC (Tue)
by andresfreund (subscriber, #69562)
[Link]
Or at the very least they could have provided a sane way to check if overflow occurs. Introducing that decades after making signed overflow UB is insane. A correct implementation of checking whether the widest integer type overflows is quite painful, particularly for multiplication.
Here's postgres' fallback implementation for checking if signed 64bit multiplication overflows:
/*
Posted Jan 27, 2024 15:32 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (3 responses)
Posted Jan 27, 2024 16:47 UTC (Sat)
by ballombe (subscriber, #9523)
[Link] (2 responses)
Posted Jan 27, 2024 21:07 UTC (Sat)
by donald.buczek (subscriber, #112892)
[Link] (1 responses)
Sorry, I'm note sure, I understand that question? gcc understands `__builtin_add_overflow` since 5.1 (2015) and clang since 3.8 (2016) . Or is this a counter-argument, because using the builtin would make the code to much dependent on the compiler version and variant? True, but the proposed macros use the builtin, too. And for the purpose to support other compilers, it would be enough to use one level of macro with no change of function or even #define __builtin_add_overflow only for those compilers which don't provide it.
Posted Jan 27, 2024 23:30 UTC (Sat)
by ballombe (subscriber, #9523)
[Link]
Thanks. Most gcc builtin are CPU-specific. It seems these ones are an exception!
Posted Jan 27, 2024 15:39 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (30 responses)
Posted Jan 27, 2024 16:05 UTC (Sat)
by smcv (subscriber, #53363)
[Link] (29 responses)
I believe so. All modern architectures are two's-complement (-1 is all-bits-1), but Standard C wanted to accommodate sign-and-magnitude (-x has all the same bits as +x, except for flipping the sign bit), and unfortunately chose to make signed overflow be undefined behaviour (compiler may assume that the whole program is meaningless if signed overflow would occur) rather than merely implementation-defined (compiler does something vaguely reasonable with it, in practice delegating to whatever way it works in hardware).
Posted Jan 27, 2024 17:28 UTC (Sat)
by marcH (subscriber, #57642)
[Link] (28 responses)
https://blog.llvm.org/2011/05/what-every-c-programmer-sho...
> If arithmetic on an 'int' type (for example) overflows, the result is undefined. One example is that "INT_MAX+1" is not guaranteed to be INT_MIN. This behavior enables certain classes of optimizations that are important for some code. For example, ...
Posted Jan 28, 2024 1:53 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (13 responses)
Posted Feb 2, 2024 16:34 UTC (Fri)
by anton (subscriber, #25547)
[Link] (12 responses)
Now why did they make it undefined rather than implementation-defined or whatever other options they had for not standardizing the behaviour (some people put a lot of weight on whether it's "undefined" or something else)? Looking at the shifts where some cases are undefined and some are implementation-defined, it seems to me that they chose undefined for cases where a compiler for one target could trap. Declaring the behaviour to be undefined allowed the time-travel clause to trigger, so the program could be terminated right away rather than first having to finish delayed work (in particular, writing out file buffers).
Posted Feb 2, 2024 16:56 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (10 responses)
But why? A compiler trapping is surely just as implementation-defined as anything else, surely?
Even if they'd gone through UB and said "here are a bunch of flags to turn it into implementation-defined", that would have stopped a lot of this nonsense.
Of course, as others have pointed out, the more C moves away from the metal and towards the "C virtual machine", the worse this problem gets. But even acknowledging this difference and defining the behaviour of the virtual machine, and then allowing the user to choose between virtual machine or hardware at least gives the programmer the opportunity to reason about his program.
Cheers,
Posted Feb 2, 2024 17:56 UTC (Fri)
by anton (subscriber, #25547)
[Link] (9 responses)
Posted Feb 2, 2024 18:44 UTC (Fri)
by Wol (subscriber, #4433)
[Link] (8 responses)
If the implementation says "terminate with prejudice" then why would it flush buffers? At the end of the day, all the compiler writers need do is describe what happens.
Cheers,
Posted Feb 3, 2024 8:36 UTC (Sat)
by anton (subscriber, #25547)
[Link] (7 responses)
If that clause actually exists in a C standard, the question is why they added it. In the years preceding C89, the undefined behaviour shenanigans that C compiler writers would start with a decade or so later were non-existent practice (while standards usually standardize common practice), so I wondered where such a clause would come from. My theory was that it was there to explain that a program terminated by trapping does not have all the output that, e.g., fwrite() calls output prior to termination; however, looking in the C standards, the buffering is there explicitly, so no time travel is needed to explain that behaviour, falsifying my theory.
OTOH, if no time travel permission exists, C compilers must not assume that programs will only perform defined behaviour in the future (and base their reasoning on that assumption). At most they can assume that only defined behaviour has happened earlier in program execution, and a number of compiler shenanigans would be non-standard (the blog post linked to above gives examples).
Posted Feb 3, 2024 13:35 UTC (Sat)
by excors (subscriber, #95769)
[Link] (6 responses)
> A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
As far as I can see, there's no equivalent in any C specification, but that seems to be because C is unfortunately vague about the concept of conformance.
C99 defines "strictly conforming program" (which "shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior"), and "conforming implementation" (which "shall accept any strictly conforming program"), and "conforming program" (which "is one that is acceptable to a conforming implementation"). So if your program's output depends even on implementation-defined behaviour like INT_MAX > 32767 (i.e. almost all programs), it's not a strictly conforming program, and whether it is a conforming program is entirely dependent on whether someone has written a compiler that considers your program "acceptable" (whatever that means); C99 declines to be responsible for defining that.
And then it specifies "the results of executing conforming C programs" in terms of the observable behaviour (file outputs and stdio) of the abstract machine. I believe the results of executing non-conforming programs (i.e. whatever the compiler arbitrarily considers unacceptable) are not defined at all, so a conforming implementation can do whatever it wants (including time travel).
That also means "strictly conforming program" is a circular definition - that's defined in terms of the program's output, which is only defined if it's a conforming program, which depends on whether it's accepted by an implementation that accepts strictly conforming programs. So C99 doesn't make much sense here. (At least that's my current interpretation; the specification seems quite unclear.)
I think C++98 is much clearer by replacing the concept of "conforming program" with "well-formed program", which is defined properly within the specification and can be determined at compile time (it doesn't depend on input or output or the implementation's whims), and is separate from the concept of undefined behaviour (which can be input-dependent). That's why it added the explicit statement about well-formed programs containing undefined behaviour - that distinction did not exist in C. So I don't think it's a deliberate change in language design, it's simply that the C specification was poorly written (by modern standards) and relied on everyone guessing what it really meant and ignoring what it actually said, and the C++ specification editors did a better job.
Posted Feb 3, 2024 16:13 UTC (Sat)
by anton (subscriber, #25547)
[Link] (5 responses)
So compiler maintainers are ruining the C experience because of a C++ standard? Ouch (but I have been suspecting so for a long time, only not in connection with time travel).
According to your interpretation, everything in the C standard depends on the conformance definitions which you claim are circular and therefore does not make sense. Of course any claim that it allows time travel does not make any sense, either.
IANACLL (C language lawyer), but my interpretation up to now has been that the rest of the standard defines how an implementation has to execute a program; this does not include a time travel provision. The conformance levels are just there to classify a program or an implementation:
Posted Feb 5, 2024 10:08 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (4 responses)
Note that in the case of MSVC, Microsoft used to claim only C++ standard conformance, and claimed in the product briefing that the compiler's C support was defined in terms of the C++ support, rather than being defined by the C standard. I would not be surprised if the same was true of other compilers.
Posted Feb 5, 2024 12:15 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Posted Feb 5, 2024 14:01 UTC (Mon)
by farnz (subscriber, #17727)
[Link] (2 responses)
Good question; I've not paid attention for a long time; I know of many C compilers, but not many C++ compilers, and most of the C compilers can best be described as "barely maintained" (one I'm aware of - an expensive one - is talking about having full C90 support "soon", and mostly relies on the fact that it's a qualified compiler for the purposes of various certifications to keep going).
Posted Feb 6, 2024 4:29 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Feb 6, 2024 11:03 UTC (Tue)
by farnz (subscriber, #17727)
[Link]
AFAICT, most of them don't care about the C standard at all; they have their own dialect of C that's not the same as standard C, and a niche where it's more important that they either comply with other standards (qualified compilers etc), or a niche where they are deliberately non-standard to support legacy code bases that depend on historic behaviour that conflicts with the C standard.
For example, one vendor I'm aware of has two C compilers they sell; one is LLVM-based, and does standard C; the other is their legacy compiler, which is guaranteed to interpret certain constructs the way their compilers in the 1980s did, even though that interpretation conflicts with the C standard. But this is valuable if you have legacy code that "knows" that floating point to integer casts are bitwise, while implicit conversions are value based (for example - not the actual case, but it's that sort of weirdness).
Posted Feb 2, 2024 18:23 UTC (Fri)
by farnz (subscriber, #17727)
[Link]
IIRC, the three layers of underspecified in the C standard are:
The issue with making this unspecified or implementation defined is that implementations must give the same result for the same operation with the same input data if it's unspecified or implementation defined (and in the latter case, it must match the documentation). If you make it undefined, then this issue goes away.
Plus, there's a social perspective; when this behaviour was made undefined, the C standards committee still thought that compiler writers would mostly settle on some "sensible" behaviours for much of this stuff, and then we'd be able to do a new standard that didn't change anything concrete, just said that things that used to be undefined are now unspecified or implementation defined (or even fully defined). With hindsight, I suspect the committee would not have made it undefined behaviour if they'd known how this would turn out over the next 3 decades.
Posted Jan 28, 2024 8:40 UTC (Sun)
by epa (subscriber, #39769)
[Link] (13 responses)
And this finally explains why common C programming style is to use signed integer types, even for quantities that will never be negative in practice. Naively it makes sense to use an unsigned int both as the loop counter and as the upper bound. But if you do so, it is allowed to wrap around to zero, so a loop exit condition would never be hit if the upper bound happens to be the largest possible unsigned value. And the compiler has to allow for that.
(I’m sure it’s not just for-loops and there are other important optimizations, just that this is one of the easier cases to explain.)
But these arcane optimization side effects have very little to do with the choice between a signed quantity and an unsigned quantity. That ought to be orthogonal: so you’d have signed integers with defined wraparound semantics, signed with “undefined” overflow, and similarly two kinds of unsigned integer.
Posted Jan 28, 2024 10:46 UTC (Sun)
by khim (subscriber, #9252)
[Link]
Nah, that's simply because C grew out of B and latter only had one datatype: word. When B evolved into C loops started using Nope. Most of the time there are no difference, but without various hacks on all levels On base RV64G they are faster, but who uses these?
Posted Jan 28, 2024 11:32 UTC (Sun)
by excors (subscriber, #95769)
[Link] (1 responses)
I suspect that's almost never the real motivation. Early versions of C didn't even have unsigned types - the only types were int and char, and later float and double, so the only one suitable for a loop counter was int. The `int i; for (i=0; i<n; i++) ...` idiom appears to have already been established back then (https://www.bell-labs.com/usr/dmr/www/cman.pdf), and was widely used in the K&R book (after unsigned had been added to the language). K&R seems to use int as the default for pretty much everything, and only uses unsigned where it really needs all 16/32 bits (e.g. hash values) - and you'd almost never need all those bits for a loop counter because you wouldn't have enough RAM for such a huge array.
Also, `int` is much shorter and easier to type than `unsigned`. So I think people have continued using int for 50+ years because it's idiomatic and easy and there's been little motivation to change, not because they really care about modern compiler micro-optimisations.
Meanwhile in C++ it's idiomatic to use `for (size_t i = 0; ...)` (which is unsigned). As far as I can tell, that's largely because of STL: its designer preferred size_t as it means the library doesn't have to worry about being passed negative values for array sizes etc (C++ is much more interested in type-system-enforced correctness than C is), and it can always represent the largest possible object in memory (which by that time was approaching the 32-bit limit). STL was widely used, so everyone else used size_t for consistency, and to avoid implicit type conversions that could trigger overflows or compiler warnings.
In C++ it's also idiomatic to use `++i` instead of `i++` in loops, because `i` might be an STL iterator that has to do more work to return a copy of the old value in post-increment (at least in old compilers that optimise poorly), and there's no downside to preferring pre-increment, so pre-increment was widely recommended for performance. But I've never heard anybody make a general recommentation of using ssize_t instead of size_t to let the compiler optimise non-overflowing bounds checks - nobody cares about performance _that_ much, except when they've got a specific benchmark to profile and optimise.
Posted Jan 28, 2024 12:10 UTC (Sun)
by wtarreau (subscriber, #51152)
[Link]
Absolutely! I've observed that it has been the main reason I've continued to use it a lot, and for this precise reason I added "uint", "uchar", "ulong", "ullong" etc to my programs to reduce the cost of switching to unsigned. And it works.
That's exactly why anything based on annotations or complex type definitions has no chance to work, too much pain for the developer, and makes the code totally unreadable. I already *introduced* bugs in my code because of GCC warnings that I was trying to shut up in good faith resulting in a lot more complex construct that had bugs. Data-related bugs are one thing, but when it becomes so complicated to tell the compiler there's no issue that you end up with control bugs, that's another story and a much more serious one.
I think that some tooling is needed, and the kernel provides a lot already. For example, min_t(), max_t() are convenient to use and perform lots of checks. Anything that saves the developer from introducing casts is good, and overflow checks are good, if they're easy to use and type (i.e. not __builtin_add_overflow_foo_bar()).
Readability is super critical to produce safe code and to spot bugs in others' code during reviews. Long lines and ultra-long identifiers severely affect this and can easily result in more bugs.
Posted Jan 28, 2024 21:14 UTC (Sun)
by willy (subscriber, #9762)
[Link] (8 responses)
A fun interview question is to ask the candidate to write a program which prints out every number between 0 and UINT_MAX inclusive, using only 32 bit arithmetic.
The object, of course, is not to see whether they can, but what mistakes they make along the way, and how they handle Socratic questions about the flaws. Some candidates are quite overconfident that they know what they're doing.
I've seen some very innovative correct answers over the years too. And that's also cool because you can discuss the pitfalls and advantages of their approach over a more conventional approach.
Posted Jan 28, 2024 22:23 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
There's all sorts of questions you can ask like that :-)
> The object, of course, is not to see whether they can, but what mistakes they make along the way, and how they handle Socratic questions about the flaws. Some candidates are quite overconfident that they know what they're doing.
When I've done that, the number of people who clearly didn't hear when you say you want to see how they handle it, you don't expect the answer to be correct (or even compile!).
(The problem with that is managers. I've been in interviews where they assumed that because I knew dialect A, I could write a working program in dialect B having met it for the first time in the interview!)
Cheers,
Posted Jan 29, 2024 0:54 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
Perhaps you meant specifically integers? Computers much prefer integers, but now it's a boring problem. These days I mostly write Rust, so I'd want to write
let mut number: u32 = 0; loop { println!("{number}"); if number == u32::MAX { break; } number +=1; }
Posted Jan 29, 2024 8:43 UTC (Mon)
by gspr (guest, #91542)
[Link] (1 responses)
As a mathematician, I'd find this response silly. If you want to be pedantic: "number" does not imply "real number" any more than it implies "integer". In fact, I'd argue that without any further context, then the fact that the name UINT_MAX seems to carry the implication of being an integer is enough for a reasonable person to at least assume that the question is about integers.
Your answer would come off to me as quite obtuse. Sure, nothing wrong with clarifying along the lines of "you mean integers, right?" But assuming reals and stating that the problem cannot be solved is just silly. And problematic because you, too, made a whole bunch of assumptions about the (somewhat imprecisely specified) task.
Posted Jan 29, 2024 19:11 UTC (Mon)
by tialaramex (subscriber, #21167)
[Link]
Posted Jan 29, 2024 2:35 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
puts("0");
No arithmetic other than 32-bit is done here*, but I'm not sure this is the answer you had in mind.
* Assuming sizeof(unsigned int) == 4 as on the vast majority of real systems. You can write uint32_t, but I would have to look up how to pass that to printf portably. It's probably some ugly preprocessor macro.
Posted Jan 29, 2024 2:53 UTC (Mon)
by willy (subscriber, #9762)
[Link]
I'm not looking for bug-free code in a whiteboard interview. I'm looking for a programmer who I can work with, and that means giving and receiving feedback.
Posted Jan 29, 2024 20:24 UTC (Mon)
by jwilk (subscriber, #63328)
[Link]
printf("%" PRIu32 "\n", i); // macro defined in <inttypes.h>
Posted Jan 29, 2024 3:47 UTC (Mon)
by wtarreau (subscriber, #51152)
[Link]
BTW regarding your challenge, for me that's a typical use case for do ... while since you can only test for the end after the operation, e.g i=0; do { printf("%u\n", i); } while (++i);
Posted Feb 2, 2024 17:48 UTC (Fri)
by anton (subscriber, #25547)
[Link]
Wang et al. found that in only one loop in one of the SPEC Cint 2006 benchmarks there was a measurable difference between undefined and wraparound behaviour, due to sign-extending the int in the wrapping case (condensed Godbolt example). You can avoid this problem (on AMD64) by using size_t (as suggested by Wang et al.) or any other unsigned type for the loop index (but RV64G may need zero-extensions if the index type is smaller than an address), or by using an address-sized type (e.g., long, unsigned long on Linux targets).
Posted Jan 31, 2024 12:20 UTC (Wed)
by paulj (subscriber, #341)
[Link] (3 responses)
if (x + y < x)
Why not make use of some basic arithmetical properties to get a construction that is guaranteed not to overflow?:
if (x > max_size_of(y,x) - y)
max_size_of() can be engineered to produce the right answer according to type promotion of x+y. Or just use APPROPRIATE_SIZE_MAX if it's obvious.
?
Posted Feb 1, 2024 10:06 UTC (Thu)
by matthias (subscriber, #94967)
[Link] (2 responses)
So you are subtracting a possibly negative number from the maximal number that can be represented? How is this guaranteed to not overflow?
Posted Feb 1, 2024 10:18 UTC (Thu)
by paulj (subscriber, #341)
[Link] (1 responses)
I see another subthread with same question, and apparently it will even compile to same assembly.
Posted Feb 1, 2024 10:46 UTC (Thu)
by matthias (subscriber, #94967)
[Link]
And how is the negative case for signed trivial to check. With signed you have three cases, adding two positive numbers, adding two negative numbers and adding numbers with opposite sign. The last one cannot overflow, but you have to check for the two other cases. And I do not see how negative numbers are any better than positive numbers.
I am happy to mostly writing rust. "x.checked_add(y)" is so much more natural. Or with overflow handling:
let Some(z) = x.checked_add(y) else {
Posted Feb 2, 2024 18:10 UTC (Fri)
by anton (subscriber, #25547)
[Link]
There are the following overflow conditions:
There are the following common behaviours on an overflow:
Posted Mar 3, 2024 13:44 UTC (Sun)
by mmjo (guest, #55791)
[Link]
Better handling of integer wraparound in the kernel
Now it is 3 identifiers, all looking equal at first glance.
sizeof(*request->scan_6ghz_params) * count +
sizeof(*request->ssids) * rdev_req->n_ssids,
GFP_KERNEL);
What about "s = kmalloc([[sizeof(struct S) + n * sizeof(u32)]], GFP_KERNEL);"?
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
I agree but if we somehow could start over fallthrough should really be a keyword and break before nonempty case should be implicit.
Better handling of integer wraparound in the kernel
switch (X) {
case 0:
// implicit fallthrough
case 1:
something;
// implicit break
case 2:
something;
/* explicit */ fallthrough;
case 3:
something;
/* explicit */ break;
case 4;
something;
}
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
case 1 - should be a compiler error on ambiguous switch statement - should not assume break or fallthrough
case 2 - ok due to new explicit keyword fallthrough;
case 3 - ok due to explicit break;
case 4 - ok due to fallthrough and break having identical behavior in the final case statement
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
> you can use preprocessor magic to figure out what's available and wrap it in a macro
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Not just what one has to say, but how one chooses to say it
Better handling of integer wraparound in the kernel
Wol
Better handling of integer wraparound in the kernel
char* end = buf + len;
if (end < buf) { end = -1; len = end - buf; }
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
* checked_add(): Returns an Option<T> which is None if overflow would happen. Nicely compatible with the question mark operator, match, if let, etc. constructs.
* overflowing_add(): Returns the result of wrapping_add() as well as a bool indicating whether we overflowed.
* saturating_add(): Returns (the Rust equivalent of) INT_MAX when we overflow.
* unchecked_add(): Overflow is undefined behavior. Can only be called in an unsafe block (or unsafe function).
* wrapping_add(): Wraps around using two's complement (signed) or modular arithmetic (unsigned).
Better handling of integer wraparound in the kernel
> It would need bounded integer types like those in Ada.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
> But in the real world you can usually put a bound on things
Better handling of integer wraparound in the kernel
TeX
there was always hugeTeX
for people who need more or something.Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
Ok let's stop this here please. Seriously.
Enough
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
Better handling of integer wraparound in the kernel
int a = 5;
int b = INT_MIN;
int c = b - a;
if (c < b) {
printf("c less than b\n");
}
if (b < c) {
printf("b less than c\n");
}
if (c == 0) {
printf("c is zero\n");
}
if (b == c) {
printf("b equals c\n");
}
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
>> they're signed
>
> But they magically have different behavior depending on the whether you compile
> in debug or release mode, because the "debug" and "release" profiles define
> `overflow-checks` differently. This is surprising, too, and I'm not sure, if that was a
> good design choice.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
That would make a huge difference.
Better handling of integer wraparound in the kernel
> That would make a huge difference.
* Overflow can only happen if at least one value is outside the range
* sqrt(min)..sqrt(max) so check that first as the division can be quite a
* bit more expensive than the multiplication.
*
* Multiplying by 0 or 1 can't overflow of course and checking for 0
* separately avoids any risk of dividing by 0. Be careful about dividing
* INT_MIN by -1 also, note reversing the a and b to ensure we're always
* dividing it by a positive value.
*
*/
if ((a > PG_INT32_MAX || a < PG_INT32_MIN ||
b > PG_INT32_MAX || b < PG_INT32_MIN) &&
a != 0 && a != 1 && b != 0 && b != 1 &&
((a > 0 && b > 0 && a > PG_INT64_MAX / b) ||
(a > 0 && b < 0 && b < PG_INT64_MIN / a) ||
(a < 0 && b > 0 && a < PG_INT64_MIN / b) ||
(a < 0 && b < 0 && a < PG_INT64_MAX / b)))
{
*result = 0x5EED; /* to avoid spurious warnings */
return true;
}
*result = a * b;
return false;
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
The likely explanation for non-standardization by the C89 committee is that compilers for different targets behaved differently (and they wanted to support sign-magnitude, ones-complement and two's-complement targets, even though by that time all machines that were not followups on earlier machines had been two's-complement for more than two decades in 1989).
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
As mentioned, my guess is that "undefined behaviour" does not require writing out write buffers, while an implementation-defined behaviour of "... terminates the program" without any further provisions would require the program to flush out any buffered writes. Maybe one could specify the implementation-defined behaviour in a way that avoids this requirement, but I guess that the C89 committee just wanted to spare the implementations from having to do such specifications.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
They need to describe what happens relative to what the standard describes. It has been claimed that the C standard contains a time-travel clause. However, looking in the C90 document, in n2794 (final C99 draft), and n1570 (final C11 draft), I don't find this clause, so maybe it is just a myth.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
That clause exists in C++11 (quoted in the linked article) and identically in C++98
Thank you. Even looking at the blog entry again, I do not see a reference to C++11 (maybe a JavaScript-only reference?).
Better handling of integer wraparound in the kernel
So compiler maintainers are ruining the C experience because of a C++ standard? Ouch (but I have been suspecting so for a long time, only not in connection with time travel).
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
> And this finally explains why common C programming style is to use signed integer types, even for quantities that will never be negative in practice.
Better handling of integer wraparound in the kernel
int
for indexes and then CPUs (sic!) grew various hacks to make these efficient (AMD and Intel have macrops fusion, RISC-V have special instructions to cope, etc).size_t
would have been faster, of course.Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Wol
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
for(unsigned int i = 1; i != 0; i++){
printf("%u\n", i);
}
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
The stuff about this particular for loop is symptomatic for this blog series. It gives a (non-idiomatic) example fragment, and claims that allowing the compiler to assume that this loop is not infinite "allows a broad range of loop optimizations to kick in", but does not say what they are and how little benefit they provide. So I filled in some missing parts and tried out the result on Godbolt, and I found that gcc-13.2 -O3 delivers the same result without and with -fwrapv.
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
<it wrapped>
<it wrapped>
Better handling of integer wraparound in the kernel
>
> if (x > max_size_of(y,x) - y)
Better handling of integer wraparound in the kernel
Better handling of integer wraparound in the kernel
< handle overflow >
}
This article uses an unfortunate mixture of Cook's unfortunate terminology and something else (maybe something like the terminology below). Here's an attempt at something better:
Better handling of integer wraparound in the kernel
With this terminology, the usage of "wraparound" for signed integers or pointers makes more sense than with Cook's terminology as presented in the article.
Better handling of integer wraparound in the kernel