Insulating layer?

Posted Oct 13, 2024 19:50 UTC (Sun) by mathstuf (subscriber, #69389)
In reply to: Insulating layer? by mb
Parent article: On Rust in enterprise kernels

I think touching (reading or writing) padding is always UB (or, at best, indeterminate) because a containing `enum` may use the padding to store the discriminant.

Insulating layer?

Posted Oct 13, 2024 22:33 UTC (Sun) by khim (subscriber, #9252) [Link] (36 responses)

It's undeterminate, but not UB. Normally reading the uninitialized value in Rust is UB, but reading padding is specifically excluded: Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values. In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in “padding” (the gaps between the fields/elements of a type).

It's just too easy to write code that reads padding, for one reason or another, thus keeping it UB was considered to be too dangerous.

But while reading padding is not an UB, the result is still some random value, and not zero.

Insulating layer?

Posted Oct 14, 2024 7:54 UTC (Mon) by Wol (subscriber, #4433) [Link] (35 responses)

> It's just too easy to write code that reads padding, for one reason or another, thus keeping it UB was considered to be too dangerous.

Imho, that's the wrong logic entirely. There is no reason it should be UB, therefore it shouldn't be.

Accessing the contents pointed to by some random address should be UB - you don't know what's there. It might be i/o, it might be some other program's data, it might not even exist (if the MMU hasn't mapped it). You can't reason about it, therefore it's UB.

But padding, uninitialised variables, etc etc are perfectly valid to dereference. You can reason about it, you're going to get random garbage back. So that shouldn't be undefined. But you can't reason about the consequences - you might want to assign it to an enum, do whatever with it that has preconditions you have no clue that this random garbage complies with. Therefore such access should be unsafe.

Principle of least surprise - can you reason about it? If so you should be able to do it.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 13:08 UTC (Mon) by khim (subscriber, #9252) [Link] (34 responses)

> Imho, that's the wrong logic entirely. There is no reason it should be UB, therefore it shouldn't be.

Thanks for showing us, yet again, why C/C++ couldn't be salavaged. Note how you have entirely ignoring everything, except your own opinion (based on how hardware works).

Just because there are no reason for it to be UB from your POV doesn't mean that there are not reason for it to be UB from someone's else POV. And, indeed, that infamous be || !be is very much UB in both Rust and C.

Yes, it may not make much sense for the “we code for the hardware” guys, it may be somewhat problematic, but, in Rust, it's not easy to hit that corner case (just look on how many contortions I had to do to finally be able to shoot myself in the foot!) and it helps compiler developers thus it was decided that access to “normal” uninitialized variable is UB. Even if some folks think it shouldn't be.

> But padding, uninitialised variables, etc etc are perfectly valid to dereference.

Nope. That's not how both C and Rust work. And in Rust reading uninitialised variables is UB while reading padding is not. Because that's where the best compromise for all parties involved was found.

> Therefore such access should be unsafe.

Just how much usafe? What about that pesky be || !be? Should it guaranteed to return true or is it permitted to return false or crash? Note that both C and Rust allow all three possibilities (although rustc tries to help the programmer and make it crash when it's easily detectable, but such behavior is very much not guaranteed).

> Principle of least surprise - can you reason about it? If so you should be able to do it.

Except that was tried for many decades and simply doesn't work. Answer to the can you reason about it? is very much depends on the person that you are asking. But to write something in any language at least two persons should give the same answer to that question: the guy who writes the compiler (or interpreter) and the guy who uses said compiler.

Simple application of principle of least surprise only works reliably when the sole user of the language is also its developer – and in that case it's very much not needed.

Insulating layer?

Posted Oct 14, 2024 15:02 UTC (Mon) by Wol (subscriber, #4433) [Link] (32 responses)

> > But padding, uninitialised variables, etc etc are perfectly valid to dereference.

> Nope. That's not how both C and Rust work. And in Rust reading uninitialised variables is UB while reading padding is not. Because that's where the best compromise for all parties involved was found.

Are you saying that the memory location is not allocated until the variable is written? Because if the memory location of the variable is only created by writing to it, then fair enough. That however seems inefficient and stupid because you're using indirection, when you expect the compiler to allocate a location.

But (and yes maybe this is "code to the hardware" - but I would have thought it was "code to the compiler") I'm not aware of any compilers that wait until a variable is written before allocating the space. When I call a function, isn't it the norm for the COMPILER to allocate a stack frame for all the local variables? (yes I know you can delay declaration, which would delay allocation of stack space.) Which means that the COMPILER allocates a memory location before the variable is accessed for the first time. Which means I may be getting complete garbage if I do a "read before write", but fair enough.

Okay, I don't know how modern gcc/clang etc work, but I'm merely describing how EVERY compiler (not many) I've ever had to dig into deep understanding of, work.

So no, this is NOT "code to the hardware". It's "code to the mental model of the compiler", and if you insist it's code to the hardware, you need to EXPLAIN WHY.

Or are you completely misunderstanding me and thinking I mean "dereference a pointer to memory" - which is not what I said and not what I meant!!! OF COURSE that's UB if you are unfortunate/stupid/incompetent enough to do that - taking the contents of the address of the pointer is perfectly okay - the compiler SHOULD have made sure that the address of the pointer points to valid memory. USING that contents to access memory is UB because using garbage as pointer to memory is obviously a stupid idea.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 15:21 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (17 responses)

IIUC, the way this is thought about is that there are additional values for a memory location beyond the 256 encoded by the bits directly. LLVM's model has `undef` and `poison` at least. The only way to clear them is to write to them. So the memory is allocated, but may be (logically) represented as an language-unrepresentable value. Note that some hardware does have things like this: CHERI actually has 129 bits per pointer, the last not being addressable by the pointer value but is instead managed with dedicated instructions (probably?) to indicate "is a valid pointer". So while in C (and Rust) one could write the equivalent of `T* ptr = (T*)some_u128;`, `ptr` might not actually be usable as a pointer (though when such casts obey the *language* rules, the insertion of instructions to set that flag bit to the right state should be inserted).

Insulating layer?

Posted Oct 14, 2024 15:40 UTC (Mon) by Wol (subscriber, #4433) [Link] (16 responses)

But why does that make a difference to your ability to reason about it? It's basically the same problem I have in databases, where Pick has the empty string and SQL has NULL, and you're forever cursing the designer's decision to use those to mean about four different things.

Going back to the "to be or not to be", if uninitialised variables are defined as containing the value "undef", about which you cannot reason, then "be || !be" would refuse to compile. But it wouldn't be UB, it would be an illegal operation on an undefined variable.

If, however, the language allows you to operate on undefined variables inside an unsafe block, then to_be_or_not_to_be() would be an unsafe function, only callable from other unsafe functions, unless it actively asserted that it itself would not return a value of "undef".

(Like SQL allows logical operations on NULL, where any NULL in the expression means the result is NULL.)

And if you have a "convert random garbage to boolean" function, that can even handle undef and poison, then to_be_or_not_to_be() would just be like any normal function - guaranteed to return a boolean, just not necessarily a predictable one.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 17:28 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (15 responses)

> But why does that make a difference to your ability to reason about it?

There are codepaths which make it impossible to know if something is actually initialized. Diagnostics to that effect are madness. Rust can avoid a lot of these issues because tuples are a language-level thing and functions wanting to return multiple values don't have to fight over which one "wins" the output while the rest live as output parameters. C++ at least has tuples, but they are…cumbersome to say the least.

bool be;
int status = do_init(&be);
// Is `be` initialized or not? `init` is a function call whose implementation is not visible at compile or link time.

Given that reading if it is actually uninitialized it is UB, the optimizer is allowed to assume that it is initialized. It may choose to emit a diagnostic that it is doing so, but it doesn't have to (and that may end up in all kinds of unwanted noise). Since the valid representations are "0" or "1", `be || !be` can be as-if implemented as `be == 1 || be == 0` which leads to UB effects of "impossible branches" being taken when it ends up being a bit-value of 2.

Insulating layer?

Posted Oct 14, 2024 18:01 UTC (Mon) by smurf (subscriber, #17840) [Link] (2 responses)

> There are codepaths which make it impossible to know if something is actually initialized

… in C++. There is no such thing in Rust — unless you're using "unsafe", that is. If you do, it's your responsibility to hide 100% of that unsafe-ness from your caller, at as low a level as possible, where reasoning about the state of a variable is still easy enough to do something about it.

Insulating layer?

Posted Oct 14, 2024 19:08 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

Exactly my point. to_be_or_not_to_be() knows it's undefined. So either it returns an unsafe boolean, or it has to fix the problem itself. If it knows it's returning undef in a "pure" boolean, it has to be a compiler error.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 19:58 UTC (Mon) by smurf (subscriber, #17840) [Link]

to_be_or_not_to_be() knows it's undefined because its undefined-ness is expressed on exactly one line of C[++].

That argument no longer holds when it's spread over multiple functions, or even compilation units.

Insulating layer?

Posted Oct 14, 2024 19:17 UTC (Mon) by Wol (subscriber, #4433) [Link] (11 responses)

> int status = do_init(&be);

What does do_init() do? If it merely returns whether or not "be" is initialised, then how can the optimiser assume that it is initialised? That's a massive logic bug in the compiler!

If, on the other hand, it forces "be" to be a valid value, then of course the compiler can assume it's initialised. But that would be obvious from the type system, no?

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 19:42 UTC (Mon) by khim (subscriber, #9252) [Link] (10 responses)

> If it merely returns whether or not "be" is initialised, then how can the optimiser assume that it is initialised?

That's easy: because correct program is not supposed to read uninitialized variable it can conclude that on all branches where it's read it's successfully initilialized. Then it's responsibility of developer to fix code to ensure that

> But that would be obvious from the type system, no?

Nope. When you call read(2) nothing in the typesystem differs for return values that are larger and smaller than zero.

Insulating layer?

Posted Oct 14, 2024 20:19 UTC (Mon) by Wol (subscriber, #4433) [Link] (9 responses)

> > If it merely returns whether or not "be" is initialised, then how can the optimiser assume that it is initialised?

> That's easy: because correct program is not supposed to read uninitialized variable it can conclude that on all branches where it's read it's successfully initilialized. Then it's responsibility of developer to fix code to ensure that

Circular reasoning !!! Actually, completely screwy reasoning. If do_init() does not alter the value of "be", then the compiler cannot assume that the value of "be" has changed!

Let's rephrase that - "Because a SAFE *function* is not supposed to read an uninitialised variable".

to_be_or_not_to_be() knows that "be" can be 'undef'. Therefore it either (a) can apply boolean logic to 'undef' and return a true boolean, or (b) it has to return an "unsafe boolean", or (c) it's a compiler error. Whichever route is chosen is irrelevant, the fact is it's just *logic*, not hardware, and it's enforceable by the compiler. In fact, as I understand Rust, normal compiler behaviour is to give the programmer the ability to choose whichever route he wants!

All that matters is that the body of to_be_or_not_to_be() is marked as unsafe code, and the return value is either a safe "true boolean", or an unsafe boolean that could be 'undef'. At which point the calling program can take the appropriate action because IT KNOWS.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 20:31 UTC (Mon) by khim (subscriber, #9252) [Link]

> In fact, as I understand Rust, normal compiler behaviour is to give the programmer the ability to choose whichever route he wants!

Nope. Normal compiler behavior is still the same as in C: language user have to ensure program doesn't have any UBs.

The big difference is that for, normal, safe, subset of Rust it's ensured by the compiler. But for unsafe Rust it's still resposibility of the developer to ensure that program doesn't violate any rules WRT UB.

> to_be_or_not_to_be() knows that "be" can be 'undef'.

Nope. It couldn't be undef. In Rust MaybeUninit<bool> can be undef, but regular bool have to be either true or false. Going from MaybeUninit<bool> to bool when it's undef (and not true or false) is an instant UB.

> At which point the calling program can take the appropriate action because IT KNOWS.

How does it know? You couldn't look on MaybeUninit<bool:gt; and ask it whether it's initialized or not. It's still very much a resposibility of Rust user to ensure that program doesn't try to convert MaybeUninit<bool> which contains undef into normal bool.

Insulating layer?

Posted Oct 14, 2024 20:41 UTC (Mon) by daroc (editor, #160859) [Link] (7 responses)

I think that using three exclamation marks in a row might be a sign that this back-and-forth is not going anywhere in particular. This is a worthy discussion topic, but I'm not sure the last few comments have added anything new.

Insulating layer?

Posted Oct 14, 2024 23:43 UTC (Mon) by atnot (subscriber, #124910) [Link] (6 responses)

> This is a worthy discussion topic, but I'm not sure the last few comments have added anything new.

I agree it may have been a worthy topic once upon a time. But when the same two people (khim and wol) have the same near-identical drawn out discussions for dozens of messages a week in every amenable thread to the point of drowning out most other discussion on the site (at least without the filter) and making zero progress on their positions over a span of at least 2 years, perhaps some more action is necessary.

Insulating layer?

Posted Oct 15, 2024 10:13 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

Apologies. I try not to respond to khim any more.

Unfortunately, I suspect the language barrier doesn't help. I think sometimes we end up arguing FOR the same thing, but because we don't understand what the other one is saying we end up arguing PAST each other.

Cheers,
Wol

Insulating layer?

Posted Oct 15, 2024 11:15 UTC (Tue) by khim (subscriber, #9252) [Link] (4 responses)

> Unfortunately, I suspect the language barrier doesn't help.

It could be a language barrier but I have seen similar discussions going in circles endlessly with a pair of native speakers, too, thus I suspect problem is deeper.

My feeling is that it's ralated to difference between how mathematicians apply logic and laymans do it.

> I think sometimes we end up arguing FOR the same thing, but because we don't understand what the other one is saying we end up arguing PAST each other.

No, that's the issue. The big problem with compiler development (and language development) lies in the fact that compiler couldn't answer any interesting questions about semantics of your program.

And that's why we go in circles. Wol arguments usually come in the form of:

Compiler can do “the right thing” or “the wrong thing”
“The wrong thing” is, well… wrong, it's bad thus compilers have to do “the right thing”

And my answer comes in the form of:

Sure, we may imagine compilers that do “the right thing” or “the wrong thing”
Except usually “the wrong thing” is, possible to implement while “the right thing” is impossible to implement
We are stuck with “the wrong thing” this talking about “the right thing” is pointless
We may try to mitigate consequence of doing “the wrong thing” using some alternate approaches

And Wol ignores the most important #2 step from my answer and goes back to “compilers should do “the right thing”… even if I have no idea how can they do that”.

I have no idea why is it so hard for layman to accept that compilers are not omnipotent and compiler developers are not omnipotent either, that there are things that compilers just could never do… but that's the core issue: “we code for the hardware” guys somehow have a mental model of a compiler which is both simulatentously very simple (as discussion about how modern compilers introduce shadow variables for each assignment show) and infinitely powerful (as discussions about what compiler have to detect and report show).

I have no idea how compiler can be simultaneously primitive and yet infinitely powerful, that's definitely a barrier right there, but it's not a “native speaker” vs “non-native speaker” barrier.

And I continue the discussion to see if that barrier could be broken, somehow – because in that case C/C++ would have a chance! If that barrier could be broken then there's a way to reform the C/C++ “we code for the hardware” community!

But so far it looks hopeless. Safe languages would still arrive, they would just have to arrive the sad way, one funeral at time way. And that would be the end of C/C++, because “one funeral at time” approach favors transition to the new language.

Insulating layer?

Posted Oct 15, 2024 16:18 UTC (Tue) by atnot (subscriber, #124910) [Link]

I think this is a very tonedeaf way to respond to someone agreeing to finally stop participating in a discussion has been unproductive for years. Even if the powers that be continue to tolerate it, perhaps it's time to practice the (admittedly difficult) skill of letting someone be "wrong on the internet", for the sake of the rest of us.

Insulating layer?

Posted Oct 15, 2024 16:42 UTC (Tue) by intelfx (subscriber, #130118) [Link] (2 responses)

One has to have a non-trivial amout of gall to unapologetically continue to paint oneself as a "mathematician" and attack your opponent as a "layman"... immediately after you were asked to stop not just the attacks, but the entire argument.

Insulating layer?

Posted Oct 15, 2024 21:58 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

And to say I'm not a mathematician ... well I don't necessarily understand symbolic logic, and I don't have a COMPUTER degree, but I've got an Honours in Science (Maths, Physics and Chemistry). And I've got a Masters, too (in technology, though I'm much less impressed with the value of that degree than the plain Bachelors). And I've got a very good ituition for "this argument sounds screwy" ...

And when I try to argue LOGICally, if people argue straight past my logic and don't try show me what's wrong with it, I end up concluding they have to be attacking the messenger, because they are making no attempt to attack the message.

Khim is saying Science advances one funeral at a time, but it'll be his funeral not mine - if you look back over what I've written it's clear I've learnt stuff. Don't always remember it! But as I've said every now and then in the past, you need to try to understand what the other person is arguing. I get the impression khim finds that hard.

Cheers,
Wol

Insulating layer?

Posted Oct 15, 2024 22:13 UTC (Tue) by daroc (editor, #160859) [Link]

Alright, I think that's a fairly clear personal attack. Probably I should have said something after khim's message. Sorry, Wol, for not stepping in at that point. But in either case, this topic should end here.

Insulating layer?

Posted Oct 14, 2024 19:45 UTC (Mon) by dezgeg (subscriber, #92243) [Link] (3 responses)

See https://devblogs.microsoft.com/oldnewthing/20040119-00 for example of a real architecture where attempting to use an uninitialized register variable might cause an exception.

Insulating layer?

Posted Oct 14, 2024 20:32 UTC (Mon) by Wol (subscriber, #4433) [Link] (2 responses)

Interesting.

But I'm not arguing that attempting to dereference garbage is okay. That post explicitly says the programmer chose to return garbage. No problem there. The problem comes when you attempt to USE the garbage as if it was valid. Rust would - I assume - have marked the return value as "could be garbage", and when the caller attempted to dereference it without checking, Rust would have barfed with "compile error - you can't unconditionally dereference possible garbage".

The point is, the programmer can reason about it because Rust would force them to track the fact that the return value could be garbage.

Cheers,
Wol

Insulating layer?

Posted Oct 14, 2024 20:43 UTC (Mon) by khim (subscriber, #9252) [Link]

> Rust would - I assume - have marked the return value as "could be garbage"

Nope. Rust doesn't do that. Rust developer may use MaybeUninit<bool> to signal to the compiler that value may be uninitialized. And then Rust developer (and not compiler!) would decide when to go from MaybeUninit<bool> to bool (which would tell the compiler that at this point value is initialized).

If I lie to the compiler (like I did) at that point – that's an instant UB.

IOW: Rust does the exact same thing C/C++ does but mitigates the issue by making transition from “could be uninitialized” type to “I believe it's initialized now” type explicit.

> Rust would have barfed with "compile error - you can't unconditionally dereference possible garbage".

This couldn't be a compiler error, but sure enough, if you violate these rules and compiler can recognize it then there would be a warning. It's warning, not an error, because compiler may recognize this situation it but is not obliged to do that, it's something that it does on “best effort” basis.

Insulating layer?

Posted Oct 14, 2024 22:03 UTC (Mon) by dezgeg (subscriber, #92243) [Link]

The linked Itanium example doesn't dereference uninitialized pointers anywhere - it demonstrates that just attempting to store an uninitialized register value might fault. In essence, this could blow up:

int global;
void f() {
int uninitialized;
global = uninitialized;
}

Ie. direct example of an architecture where what you wrote ("But padding, uninitialised variables, etc etc are perfectly valid to dereference. You can reason about it, you're going to get random garbage back.") doesn't apply.

Insulating layer?

Posted Oct 14, 2024 20:12 UTC (Mon) by khim (subscriber, #9252) [Link] (9 responses)

> Are you saying that the memory location is not allocated until the variable is written?

Worse. For decades already most compilers create new location for every store to a variable. GCC started doing it around 20 years ago, LLVM did from the day one.

> That however seems inefficient and stupid because you're using indirection, when you expect the compiler to allocate a location.

Why would you use indirection? TreeSSA doesn't need that, you just create many copies of variable (one for each store to a variable).

> When I call a function, isn't it the norm for the COMPILER to allocate a stack frame for all the local variables? (yes I know you can delay declaration, which would delay allocation of stack space.)

No. Consider the following example:

struct II {
    int x;
    int y;
};

int foo(struct II);

struct LL {
    long x;
    long y;
};

int bar(struct LL ll) {
    struct II ii = {
        .x = ll.x,
        .y = ll.y
    };
    return foo(ii);
}

Here local variable ii ii never allocated at all.

> It's "code to the mental model of the compiler", and if you insist it's code to the hardware, you need to EXPLAIN WHY.

It's “code to the hardware” because “mental model of the compiler” couldn't explain how your program works without invoking hardware that would be executing your program. And it's only needed when you ignore rules of the language like they are described in the specification and then invent some new, entirely different specification of what you are program is doing and then assert that your POV is not just valid for today, but it would be the same 10, 20, 50 years down the road. That's… quite a presupposition.

> USING that contents to access memory is UB because using garbage as pointer to memory is obviously a stupid idea.

No, that's not about pointers, but about simple variables. Accessing them, if they are not initialized, is UB because normal program shouldn't do that and assuming that correct program doesn't access uninitialized variable is benefocial for the compiler. Heck, there are whole post dedicated to that issue on Ralf's blog.

Insulating layer?

Posted Oct 14, 2024 21:02 UTC (Mon) by Wol (subscriber, #4433) [Link] (8 responses)

> > Are you saying that the memory location is not allocated until the variable is written?

> Worse. For decades already most compilers create new location for every store to a variable. GCC started doing it around 20 years ago, LLVM did from the day one.

Ummmm ... my first reaction from your description was "Copy on write ??? What ... ???", but it's not even that!

The other thing to bear in mind is that this is a quirk of *optimising* compilers, and it completely breaks the mental model of "x = x = x", so it breaks the concept of "least surprise". And I don't buy that's "coding to the hardware". If I have a variable "x", I don't expect the compiler to create lots of "x"s behind my back!!! And while we might be dying off, there's probably plenty of people, like me! for whom this was Ph.D. research for our POST-decessors.

But. SERIOUSLY. What's wrong with either (a) pandering to the principle of least surprise and saying that dereferencing an uninitialised variable returns "rand()" (even if it's a different rand every time :-), or it returns 0x0000... The former makes intuitive sense, the latter would actually be useful, and if it's a compiler directive it's down to the programmer! which fits the Rust ethos.

If the hardware is my brain, after all, dereferencing an unitialised variable would be a runtime error, not an opportunity for optimisation and code elimination ...

Cheers,
Wol

Insulating layer?

Posted Oct 15, 2024 6:41 UTC (Tue) by khim (subscriber, #9252) [Link] (6 responses)

> And I don't buy that's "coding to the hardware".

How do you call it, then?

> If I have a variable "x", I don't expect the compiler to create lots of "x"s behind my back!!!

Just as you wouldn't expect CPU to create hundred accumulators in place of one that machine code uses? CPUs do that for about 30 years.

Just how you wouldn't expect to have one piece of memory to contain different values simultaneously? CPUs started doing that even earlier.

I'm not telling your that to shame you for not knowing, I'm explaing to you why assuming that compiler (or hardware) would work in a certain “undocumented yet obvious” way doesn't lead to something that you may trust.

There are many things that are implemented both in hardware and compiler via an as-if rule… and you don't need to even know about them if you are using language in accordance to the language specification. The same with hardware.

That's why I really like The Tower of Weakenings that Rust seems to embrace. With it normal developers have 90% of code that's “safe” and doesn't need to know anything about how hardware and compilers actually work. But in unsafe world there are also gradations. In Linux kernel there are certain tiny amount of code (related to RCU, e.g.) that touches everything simultaneously: the real hardware, compiler internals, and so on. But if said code is only 0.01% of the whole and the majority of developers work with the other 99.99% of code then precise rules used in that tiny part could be ignored by most developers.

> And while we might be dying off, there's probably plenty of people, like me!

Weren't we just talking about how major scientific breakthroughs arrive one funeral at time? Looks like that's how safety if low-level languages would arrive, too.

And that's why C/C++ wouldn't get it, in practice: language can be changed to retrofit safety into it, but who would use it in that fashion and why?

Insulating layer?

Posted Oct 15, 2024 10:09 UTC (Tue) by Wol (subscriber, #4433) [Link] (5 responses)

> > And I don't buy that's "coding to the hardware".

> How do you call it, then?

Brain-dead logic.

> And that's why C/C++ wouldn't get it, in practice: language can be changed to retrofit safety into it, but who would use it in that fashion and why?

So why am I left with the impression that it's PERFECTLY ACCEPTABLE for Rust to detect UB and not do anything about it? I take - was it smurf's - comment that "we can't promise to warn on UB because we can't guarantee to detect it", but if we know it's UB?

If a programmer accesses an uninitialised variable there's a whole bunch of reasons why they might have done it. The C(++) response is to go "I don't believe you meant that" and go and do a load of stuff the programmer didn't expect.

My understanding of the ethos of Rust is that if the compiler doesn't understand what you mean it's either unsafe, or an error. You seem to be advocating that Rust behave like C(++) and just ignore it. Sorry if I've got that wrong.

Imho (in this particular case) Rust should come back at the programmer (like any sensible human being) and ask "What do you mean?". Imho there's three simple scenarios - (1) you expect the data to come from somewhere the compiler doesn't know about, (2) you forgot to explicitly request all variables are zeroed on declaration / read-before-write, or (3) you're expecting random garbage. (If there's more, just add them to the list.)

So no you don't just dump stuff into UB, you should force the programmer to explain what they mean. And if they don't, it won't compile.

Cheers,
Wol

Insulating layer?

Posted Oct 15, 2024 10:33 UTC (Tue) by farnz (subscriber, #17727) [Link] (3 responses)

So why am I left with the impression that it's PERFECTLY ACCEPTABLE for Rust to detect UB and not do anything about it? I take - was it smurf's - comment that "we can't promise to warn on UB because we can't guarantee to detect it", but if we know it's UB?

Rust simply doesn't have UB outside of unsafe. There's no detection of UB involved at all - all the behaviour of safe Rust is supposed to be fully defined (and if it's not, that's a bug). For example, in the case of the Rust equivalent of bool be; int status = use_be(&be);, the defined behaviour is for the program to fail to compile because be is possibly used before it is initialized.

Insulating layer?

Posted Oct 15, 2024 21:40 UTC (Tue) by Wol (subscriber, #4433) [Link] (2 responses)

So accessing the contents of be before you've initialised it is not UB, it's a fatal error. Thanks.

I don't necessarily think it's the best definition, but it IS defined and it IS in-character for Rust. Which UB would not be.

Cheers,
Wol

Insulating layer?

Posted Oct 16, 2024 9:46 UTC (Wed) by laarmen (subscriber, #63948) [Link] (1 responses)

Rust makes it *hard* for you to read uninitialized variables, but not impossible:

let b: bool = unsafe { MaybeUninit::uninit().assume_init() }; // undefined behavior! ⚠️

I lifted this from the MaybeUninit doc, including the comment. That will compile, but *is* UB.

Now, I'm of the opinion that it is perfectly reasonable for Rust to declare this UB, as the alternative makes a lot of assumptions about the underlying implementation, all for a use case that seems dubious to me.

Insulating layer?

Posted Oct 18, 2024 14:27 UTC (Fri) by taladar (subscriber, #68407) [Link]

Unsafe blocks in Rust basically mean that you get to use unsafe operations but you are also responsible for upholding safety guarantees in your code inside the block. Methods like assume_init() are meant to be used after you have verified that the value is initialized, otherwise your code is unsound.

Insulating layer?

Posted Oct 15, 2024 10:42 UTC (Tue) by khim (subscriber, #9252) [Link]

> You seem to be advocating that Rust behave like C(++) and just ignore it. Sorry if I've got that wrong.

I'm not “advocating” anything, I'm just explaining how things work. Not how they “should work”. But how they, inevitably, have to work (and thus how they actually work). If out of ten choices that one like or dislike only one is actually implementable then you are getting that one whether you like it or not.

> My understanding of the ethos of Rust is that if the compiler doesn't understand what you mean it's either unsafe, or an error.

We are talking the full Rust, not just “safe” Rust here. UB is UB, whether it's in safe code or unsafe code. And yes, you can trigger UB in safe Rust – and it would lead to the exact same outcome as in unsafe Rust.

> If a programmer accesses an uninitialised variable there's a whole bunch of reasons why they might have done it.

If programmer accesses an uninitialized variable without the use of special construct that is allowed to touch undef, then it's a bug. Period, end of story. If program includes such access then it have to be fixed, there are no any other sensible choice.

The only difference of safe Rust and unsafe Rust is decision of whose responsibility is it to fix such bug. If it's “safe” Rust then it's bug in the compiler (currently there are around 100 such bugs) and compiler developers have to fix it, if it's in unsafe Rust, then developer have to fix it.

Compiler may include warning for [potential] bugs in unsafe Rust, but ultimately it's resposibility of developer to fix them.

> Imho (in this particular case) Rust should come back at the programmer (like any sensible human being)

Impossible. Compilers are mindless (they literally have no mind and couldn't have it) and not sensible (they don't have “a common sense” and attempts to add it inevitable lead to even worse outcome). That's something “we code for the hardware” people simply just refuse to accept for some unfathomable reason.

> (1) you expect the data to come from somewhere the compiler doesn't know about

In that case you have to use volatile read or volatile write.

> (2) you forgot to explicitly request all variables are zeroed on declaration / read-before-write

This is bug and it should be fixed. If you managed to do that in normal, “safe” Rust then it's bug in the compiler and it have to be fixed in compiler, if you did that in unsafe Rust, then it's bug in your code and you have to fix it.

> (3) you're expecting random garbage.

Currently that's also a bug, although there are discussions about adding such capability to the language (to permit tricks like the one used in the Using Uninitialized Memory for Fun and Profit. Currently Rust's only offer for such access is the use of asm!.

> And if they don't, it won't compile.

Not possible, sorry. If you wrote the magic unsafe keyword then it's your responsibility to deal with UB now.

Compiler may still detect and report suspicious findings, but it couldn't be sure that it detected everything correctly thus such thing couldn't be a compile-time error, only and compile-time warning.

Insulating layer?

Posted Oct 15, 2024 6:57 UTC (Tue) by smurf (subscriber, #17840) [Link]

> it completely breaks the mental model of "x = x = x",

No it doesn't. If the compiler knows at all times where your particular x lives at any given time, your mental model isn't violated. Your mental model of the apple you're going to have for tomorrow's breakfast doesn't change depending on which side of the table you put it on, does it?

Consider code like
a=fn1()
b=fn2(a)
c=fn3(a,b)
d=fn4(b,c)
return d

Now why should the compiler allocate space for four variables when c can easily be stored at a's location? it's not needed any more. d doesn't even need to be stored anywhere, simply clean the stack up and jump to fn4 directly, further bypassing any sort of human-understanding-centered model (and causing much fun when debugging optimized code).

Constructing a case where a ends up in multiple locations instead of none whatsoever is left as an exercise to the reader.

Insulating layer?

Posted Oct 14, 2024 15:22 UTC (Mon) by Wol (subscriber, #4433) [Link]

> Just because there are no reason for it to be UB from your POV doesn't mean that there are not reason for it to be UB from someone's else POV. And, indeed, that infamous be || !be is very much UB in both Rust and C.

If bool is defined as a value that can ONLY contain 1 or 0, and the location of bool contains something else (for example it's not just a single bit-field), then as I pointed out you are assigning invalid garbage to a field that cannot contain it. Either you have a coercion rule that guarantees that "be" will be understood as a boolean whatever it contains, or yes it should be a compile error or unsafe or whatever.

But imho looking at that simple example, the problem boils down to interpreting random garbage as a boolean. If the rules of the language let you do that, then I would expect it to return true - after all, the expression "(garbage) or not (garbage)" must evaluate to true if "garbage" can be interpreted as a boolean.

It's all down the language you're using, and whether that language lets you treat any random junk as a boolean. If it does, then those functions make mathematical sense and should not be UB. If the language DOESN'T let you, then it should fail to compile with an "illegal coercion" error.

And that has absolutely nothing to do with the hardware, and everything to do with mathematical logic. (Unless, of course, your DRAM behaves like a random number generator unless/until it's written to, such that (garbage) != (garbage) ...) And even there, I'd just say that you can guarantee the function will return a boolean, just not necessarily the boolean you expect ...

Cheers,
Wol