Zapping pointers out of thin air
Paul McKenney gave a presentation at Kangrejos this year that wasn't (directly) related to Rust. Instead, he spoke about the work he has been doing in concert with many other contributors on improving the handling of subtle concurrency problems in C++. Although he cautioned that his talk was only an overview, and not a substitute for reading the relevant papers, he hoped that the things the C++ community is working on would be of interest to the Rust developers present as well, and potentially inform future work on the language. McKenney's talk was, as is his style, full of subtle examples of weird multithreaded behavior. Interested readers may wish to refer to his slides in an attempt to follow along.
Lifetime-end pointer zapping
He began with lifetime-end pointer zapping, which is a somewhat obscure topic, so he gave an example of when it could be relevant. Suppose that a C++ programmer has written a stack with two operations: push and clear. They want this to be multithreaded, so the idea is that multiple threads can push to this stack, and multiple threads can clear it, but only one clearing thread will "win" and get all of the pushed items. The natural way to model this situation is with a linked list and atomic compare-and-swap (CAS) operations to swap the pointer to the head of the list.
So suppose that the top of the stack points to object A, and a thread is trying to push a second object, B. The thread puts a pointer to A inside B, but the top still points to A. Then, the thread is preempted, and a second thread clears the whole stack. It processes the entries, and frees A. Now A is on the free list, but B still points to it. Then the second thread allocates and pushes object C. The malloc() implementation gives C the same chunk of memory as A, so now B points to C — or so we might think. The first thread resumes, the compare-and-swap operation sees the same pointer (to A or C), and the push of B succeeds. See the linked slide for a graphical depiction of the process.
But all is not well. The pointer from B to C is an example of a "zombie pointer", McKenney said — it's a pointer that would be legal in assembly code, and the types match, but it's not allowed in C++. Pointers in C or C++ are not just the addresses of particular locations in memory, they also have provenance. And B's pointer to A is invalidated as soon as A is freed — even if C ends up being put back into the same location. So if the compiler can prove that A will be freed and B will hold onto a reference to it anyway, then it knows any attempts to read through that pointer are undefined behavior, and it is free to make optimizations that assume this won't happen. Plus, while the simple example of this stack is "ABA tolerant", not every data structure deals so well with problems like this.
Björn Baron agreed that this particular problem would occur in Rust as well. Even though Rust's provenance rules are not completely settled, it's still definitely undefined behavior to have a pointer to an allocation that has been freed. There has been some discussion about adding a compiler intrinsic to help annotate this case, he said. McKenney said that this matched his experience working on the problem — a lot of discussion for relatively little progress.
One problem is that compiler developers really like provenance, because it permits
otherwise impossible optimizations. Tracking provenance is complicated, though.
Provenance can be erased, such as by converting a pointer to an integer.
That isn't a complete solution to lifetime-end pointer zapping, however.
Last year Davis Herring proposed "angelic provenance
", McKenney said,
which requires the compiler to pick the provenances that make a program valid,
if there's more than one possible interpretation. This helps generally simplify
the reasoning around provenance, but does not
help with the example he gave.
One potential fix would be to make atomic<T*> values (C++'s atomic pointers) erase provenance, including pointers referenced by successful CAS operations. This proposal has gone to the C++ committee, and initial reviews have been positive. Benno Lossin said that since Rust is trying to develop its own provenance rules, it probably makes sense for the Rust developers to consider having similar APIs that remove provenance. Alice Ryhl mentioned that Miri (Rust's undefined behavior sanitizer) actually already found a problem similar to the one with the stack McKenney presented in the queue implementations in Rust's standard library earlier this year.
Out of thin air
Next, McKenney discussed out-of-thin-air (OOTA) cycles. He gave an example involving two processes and two variables (initially both zero). Suppose that one process reads the value 42 from X, using a relaxed read. Then it stores that value into Y. The other process does a relaxed read of Y — getting 42 — and then stores it into X, for the first process to read. If both variables were initially zero, where did the 42 come from? In C++ terminology, it came "out of thin air". The value does not have to be 42; a cycle like this could produce any value at all, which is a major problem for reasoning about programs, so we would like to be able to show that it does not happen. The "cycle" part of the name comes from the fact that if you draw a diagram of where the value came from, it makes a loop — X got it from Y, and Y got it from X.
OOTA cycles are intuitively impossible, and in fact do not actually happen on
real hardware —
but formalizing a description of why it does not happen turns out to be
surprisingly hard to fit into existing memory models. This is important, because
the memory model of a language is part of determining which programs are valid,
and therefore which optimizations are valid as well. The C language committee
eventually gave up, McKenney said, just defining the memory model and then
tacking on a note saying "and also don't do this.
" Java did the same
thing, when trying to formalize it around 2000, and eventually gave up.
Researchers from the University of Kent have a promising theoretical model that takes into account compiler optimizations, and could finally put the matter to rest. But their model is theoretical and not yet ready, he said. McKenney has a simpler explanation for why this isn't an issue in real life: reads take time, and instruction execution takes time. To form an OOTA cycle, he said, at least one of those steps must go backward in time in the real world. Either a read must somehow get a future value, or executing a read and a store must take negative time. The only way this can be broken is from bad hardware speculation (but bad hardware can do anything anyway), or from the compiler making it so that the operations take no time at all to execute (such as by optimizing them out).
The reason this is a poor match for existing memory models is because they don't have a notion of time — only a notion of causal relationships. And so expressing the reasons that an OOTA cycle is impossible in that timeless formal language is difficult.
Baron suggested that maybe the CPU could predict that there would be a store and
later check whether that guess was correct. He wasn't aware of any CPUs that
actually did that, but it could theoretically happen. McKenney was glad that
someone had brought that up unprompted — "and I haven't paid that man
"
he said with a grin.
McKenney explained that there are really two kinds of links between events: temporal and atemporal links. A store-to-load link is temporal; when a store is performed, it takes time for that store to propagate across a system. Time has to elapse for information to propagate in the real world. But store-to-store links are atemporal — one store can 'win' over another even if it happened earlier in time, because CPU caches don't keep a global clock. McKenney has done experiments on actual hardware showing that the winning store can happen more than a microsecond before the store that it overwrites. Finally, load-to-store links are also atemporal and tell you nothing about the ordering of the events, because you might get an old value.
So, on real computers, the only kind of events that you can rely on to establish the actual order in which things happened in the real world are store-to-load links, where one CPU stores a value, and another loads it. How is this related to speculation? Well, suppose that a CPU decided to speculate a value. It would, at some point, need to check whether that speculation was correct. In doing so, it would need to perform a load. If that load sees a different value, then it needs to throw away the speculation. But if it sees the speculated value, it still can't take a negative amount of time, since it did eventually need to do that load. This means that an OOTA cycle would still be impossible, even with speculation, because speculation cannot actually get rid of a load — only move work that depends on that load earlier in time.
This sounds comforting, but what's the takeaway for compiler developers? The fact that OOTA can't occur on real hardware for physics-based reasons means nothing to compiler developers, who are forced to rely on the memory models that the C language specification provides. Ultimately, McKenney had one takeaway for compiler people: don't invent non-volatile atomic accesses. It is already against the rules to invent accesses to volatile atomics, but it is somewhat less clear why inventing accesses to non-volatile atomics would cause a problem; refraining from doing so prevents OOTA cycles.
The reason is semantic dependencies. The C abstract machine may not have a real notion of time, but it does have a notion of data dependencies. This would be enough to prevent OOTA cycles, except that compilers are permitted to invent extraneous loads (such as to reduce register pressure). McKenney showed an example where adding a second load of a non-volatile atomic boolean enabled a sequence of optimizations that actually removed a data-dependency between two other variables. With the dependency removed, the compiler could then reorder the remaining statements to produce a value out of thin air.
So McKenney's final piece of advice on the topic of avoiding OOTA cycles is simply don't do that. Inventing, duplicating, or repurposing non-volatile atomic loads is a necessary step to cause OOTA cycles, so if the compiler doesn't do that, then it doesn't need to worry about this class of errors. Whether the C++ committee will accept McKenney's paper on the topic saying as much remains to be seen — but since this issue has vexed the C++ community for some time, it seems likely.
Posted Oct 15, 2024 16:20 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (4 responses)
Aria added an experimental (nigthly) set of Rust pointer APIs as an experiment to find out whether most (all?) Rust programmers could live with strict PNVI (Provenance NOT Via Integers) rules. Rust's Strict Provenance APIs are now getting a FCP (Final Comment Period) which is one of the last steps before the stabilization of the associated APIs. Aria's experiment also adds APIs for "Angelic determinism" (so-called PNVI-ae-ua: Provenance Not Via Integers, but with Address Exposure and User Disambiguation, the model currently proposed for the C programming language) but if you pick this you need to appropriately expose any pointers that you're expecting to be able to construct later this way.
In practice this latter API is equivalent to the existing (but to perhaps be deprecated one day) 'as' cast of an integer type to a pointer. The cases where PVI "should" work but PNVI-ae-ud could not are typically not actually working with the compiler you get today because LLVM never actually supported them.
At the very bottom where the Linux kernel hardware drivers live some amount of these "Angelic" APIs is inevitable. You literally can't express the idea "I know the physical address of this MMIO register" in the strict provenance model. However there's a reason to prefer strict provenance - a reason which doesn't apply as well to C++. Analysis both by a compiler and at runtime benefits from the PNVI semantics, in the weaker models (and especially PVI) it's not practical to do much analysis, for most non-trivial code the analysis results will just be a confused shrug. Maybe it's fine? In Rust it's practical to move the relatively modest amount of unsafe code that *could* use strict provenance to the strict API, and the part which is both unsafe *and* reliant on tricks not possible with PNVI to the exposure API, a tiny fraction which will include some Linux drivers but not most of the kernel or userspace software. In C++ if a window of time to attempt such a thing existed it was decades ago.
I hope the effect of this talk was not to derail stabilization of the provenance APIs in Rust, on the other hand, if the kernel people need something specific that's in neither strict nor exposed APIs (I don't think so) this is the last time to say so and not expect rotten fruit to be hurled by other Rust users.
Posted Oct 15, 2024 23:36 UTC (Tue)
by dcoutts (subscriber, #5387)
[Link] (3 responses)
As far as I've understood, the ABA problem can't occur in a GC'd language, because the memory location cannot get recycled by the storage manager since there's still a live reference to the old object. The same goes for the "zombie pointer" example:
> Then, the thread is preempted, and a second thread clears the whole stack. It processes the entries, and frees A. Now A is on the free list, but B still points to it.
In a GC'd language this doesn't happen. B still pointing to A keeps A live. Thus the memory location for A cannot yet be reused. And thus C will necessarily have a different address from A and so the CAS will do the right thing.
Obviously, one has to pay the GC performance tax. But having done so, it appears one can take advantage of lock free programming much more easily. Am I wrong to be so smug? Is this dangerous over-simplification?
Posted Oct 16, 2024 5:00 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
You're actually right. Garbage collection is great help for many lockless algorithms. There's a great article about it: https://bitbashing.io/gc-for-systems-programmers.html ("Garbage Collection for Systems Programmers")
Sidenote, not ALL garbage-collected languages are created equal. For example, Go uses "fat pointers" for interfaces, and they can't be copied atomically, so it can't guarantee this kind of safety.
Posted Oct 25, 2024 10:02 UTC (Fri)
by jch (guest, #51929)
[Link] (1 responses)
Couldn't you take a pointer to an interface, and manipulate that atomically instead of the interface itself?
Posted Oct 25, 2024 16:13 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Oct 15, 2024 16:25 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (25 responses)
> One potential fix would to make atomic<T*> values (C++'s atomic pointers) erase provenance, including pointers referenced by successful CAS operations.
Then you just call std::launder() every time you retrieve a pointer from an atomic<T*>, and if I am reading the standard correctly, this is perfectly legal, but might invoke implementation-defined behavior because the pointer is technically invalid and we are evaluating the pointer's value before it is passed to std::launder(). I would like to think that most compilers would be sane enough not to attempt to define and document some restriction on that (which they are required to do if they want to mess with it, since it is IB and not UB). On hardware such as CHERI, this whole idea is either unworkable (std::launder() cannot exist in full generality on that platform) or unnecessary (because compare-and-swap could hypothetically also compare the hardware's pointer metadata, and then you'd never end up in this position in the first place because the push of B would fail and have to be retried with the correct pointer to C).
I have no idea if there is an analogous function in C. Rust has expose_provenance, but it's still unstable and also does not work on CHERI.
Posted Oct 15, 2024 20:24 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (23 responses)
But thinking more broadly, something has to give in the realm of provenance. We cannot have all three of the following:
1. Pointers have indispensable provenance - that is, you can't/shouldn't ignore the provenance requirements (regardless of whether std::launder() is the correct incantation for doing so or not).
Assuming we don't want to lose (3), we can give up either of (1) or (2). (1) is already talked about both in the article and my initial comment, but the problem is that on CHERI, provenance really is indispensable at the hardware level, and there are no magic compiler incantations to change that, meaning we have no choice but to drop (2) instead of (1) when implementing C++ (or Rust, for that matter) on CHERI. So if you have source code that relies on such incantations, it will not work on CHERI, and might even fail to compile depending on how the CHERI implementation decides to deal with whatever part of the standard specifies that magic incantation.
I'm not sure how the standards committee should approach this if they want to allow programmers to write one codebase that works on both CHERI and regular hardware. You could say something like "either provenance does not exist and no std::launder-like incantation is required, or else it does exist and pointers with non-overlapping provenance must compare unequal in the first place." But then you lose a lot of compiler optimizations on standard (non-CHERI-like) hardware, so they probably don't want to do that either.
Another option would be to specify that == and similar operations have the effect of propagating provenance from one pointer to the next, but now you have two problems:
1. == is symmetric, so this expands provenance rather than replacing it. CHERI does not support a pointer that has more than one provenance at the hardware level, so the compiler has to somehow divine which direction the provenance should flow (or use "angelic provenance," but that is not an algorithm for what provenance-transfer instructions you should really emit on a CHERI-like platform, it's just a hack to say that non-CHERI-like platforms should DTRT).
Posted Oct 15, 2024 21:32 UTC (Tue)
by tialaramex (subscriber, #21167)
[Link] (15 responses)
LLVM has a pile of open tickets about this including from both Clang and Rust, read all of https://github.com/rust-lang/rust/issues/107975 to get some idea how insane the situation is, and if you want to weep read the related LLVM tickets too.
IIUC an actual CHERI hardware person chimed in on the Rust strict provenance stuff and said the whole point of CHERI is that the PNVI-ae-udi C or C++ style APIs are nonsense on CHERI and that's OK. If you can't live with strict provenance (and some of the kernel cannot) you can't have CHERI and that's too bad, you'd better be very, very careful instead. There is no way to "make it work" and attempts to do so were a disaster in the Morello hardware by their telling.
Posted Oct 15, 2024 22:29 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (14 responses)
Since I have already called out that I might be wrong about std::launder(), for the rest of this comment I am going to use the word "launder" to mean "whatever incantation causes the compiler to ignore provenance requirements for a given pointer, assuming that such an incantation exists or would be standardized." Under PVNI-plain, this incantation is a simple pointer-to-integer round trip, but I want to speak generally for all possible semantics and not just PVNI-ish semantics.
As for the LLVM thing, I think it depends on how this gets formalized. There are four-ish possible formalizations that I can think of:
0. Comparing pointers which have non-overlapping provenance is implementation-defined or UB, so LLVM's behavior is right (but maybe they still need to explicitly document it). In that case, we can't make any useful atomic<T*> data structure that works on CHERI (and possibly not even on conventional hardware) unless the compiler provides a language extension explicitly supporting it. Even laundering probably doesn't help in all cases because atomics do not (currently) provide an API to say "do a compare-exchange, but atomically launder your stored value first."
Posted Oct 15, 2024 22:58 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
-1. Pointers compare equal if and only if they point to the same address. Provenance has nothing to do with comparisons. atomic<T*> is completely impossible to use portably, as with (0) and (1).
Posted Oct 16, 2024 10:19 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (2 responses)
Basically for comparison purpose Rust wants ptr1 == ptr2 is the same as writing ptr1.addr() == ptr2.addr()
Today because of bugs LLVM will sometimes decide that ptr1.addr() != ptr2.addr() even though these two integers are literally the same. You can (with some clever footwork) make this evident in safe Rust, so then either the LLVM IR is broken (as is actually the case) or Rust's compiler is lowering the Rust representation to LLVM IR incorrectly (nope) because Rust defines the safe Rust as not having this issue, so it's definitely a bug somewhere.
Posted Oct 16, 2024 13:55 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (1 responses)
You do need to be careful. Hazard Pointers let you delay reclamation and avoid the aliasing problem here for example.
Thread 1 finds the front of the LIFO is A, it successfully points its HP at A, then it makes B, then it connects B to A, it's about to attempt the compare-exchange but then...
Thread 2 interrupts, it marks A for reclamation, but A cannot be reclaimed immediately because there's a Hazard Pointer pointing at it, so it's on the "reclaim later" pile. It makes C, and puts C at the front of the LIFO
Thread 1 resumes, it tries to atomically replace A with B at the front of the LIFO, this fails A.addr() != C.addr() and so thread 1 learns that C is now at the front. It releases the HP and points it successfully at C instead, it adjusts the address inside B to connect to C instead and this time a compare-exchange succeeds, swapping C for B.
Some time later, for any reason, reclamation is attempted for A and it now succeeds.
Posted Oct 16, 2024 14:17 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link]
In CHERI, this case works properly, despite A and C having distinct magic provenance bits, the checks all work and the correct pointer is stored in B, so long as the code was written very carefully. And I believe MIRI can check this is correct (in Rust) despite the fact most programmers do not have any CHERI test hardware.
Sorry I'll stop now, this was rattling in my head.
Posted Oct 16, 2024 7:07 UTC (Wed)
by comex (subscriber, #71521)
[Link] (4 responses)
https://github.com/rust-lang/unsafe-code-guidelines/issue...
But even if LLVM never optimized pointer comparisons, you would still be in trouble. The problem is alias analysis. The compiler is always tracking which pointers can legally alias which other pointers, and if it thinks pointer A can’t alias pointer B, then it can reorder accesses to A with respect to accesses to B. And it assumes that a freshly malloced pointer doesn’t alias with any other pointer not derived from it. Suppose pointer A is freshly malloced, and pointer B was previously freed, but you did a pointer comparison (which was not optimized) and it turns out that A and B have the same address. So you expect to be able to, say, write a value to pointer A and then read the same pointer back from pointer B. But oops, the compiler assumes that the write and the read don’t alias and switches their order!
This behavior can be justified by pointer zapping or by provenance. But if you wanted to get rid of the behavior instead of justifying it, then you would need the compiler to stop assuming that new allocations don’t alias. Either the compiler has to disable that optimization entirely, or it has to detect that you did the pointer comparison and conditionally disable it. But conditionally disabling it is a lot harder than it sounds, since e.g. the comparison might be hidden behind a function call, or it might even have been optimized away. And disabling it entirely… well, if it were just about heap allocations, you would get a small but measurable loss in optimization potential. But to be fully consistent you would probably need to also disable it for *stack* allocations, and that might be quite a bit worse.
Admittedly, it’s hard to be sure how bad the performance loss would be, because nobody has actually tried to write an optimizer that’s as aggressive as possible while still being formally sound under some sort of limited-provenance model.
The ‘fun’ thing is that this particular type of alias analysis issue essentially never occurs in real programs. And yet it’s extremely difficult to formally rule out without adding relatively-invasive provenance rules. And if you don’t formally rule it out, then it can probably be exploited from safe Rust code. And so we end up with the provenance rules.
Posted Oct 16, 2024 7:24 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
1. We can statically prove that the pointers have the same address and overlapping or identical provenances. Fold it to true.
(3) sounds weird, but it probably happens fairly often (if both pointers are known to be valid-or-else-UB-happened, if both pointers came from the same array as in a simple for loop, etc.). Meanwhile, if we're in case (4), then I find it hard to believe that we're really losing much performance from the optimization barriers anyway. And (4) is pessimistic anyway. If you can prove that one of the pointers is valid-or-else-UB-happened, you probably don't need to insert a provenance barrier for that pointer at all. The set of optimizations we're losing here looks really, really small to me, but maybe I just don't understand how compilers work.
On CHERI-with-rich-pointer-comparison, you can replace both (3) and (4) with "We can't statically prove either (1) or (2). Emit a comparison instruction," since in that case the hardware will already DTRT anyway. For that matter, you can *also* do that in any implementation which does not have provenance, since it is semantically equivalent to inserting provenance barriers "everywhere."
Posted Oct 17, 2024 13:52 UTC (Thu)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted Oct 17, 2024 18:35 UTC (Thu)
by notriddle (subscriber, #130608)
[Link] (1 responses)
Isn't the real problem nowadays TLB and cache utilization? Because it seems like this would result in severe fragmentation.
Posted Oct 18, 2024 18:05 UTC (Fri)
by Wol (subscriber, #4433)
[Link]
I understand behind the scenes malloc allocates a big area, which it hands out in small bits. If that big area is an entry in the page table, and especially if the programmer can hint to malloc that "these are long lived, these are short lived, these are small, these are large", it's then easy to move/consolidate all this in ram (copy, update the page table, you're done), and malloc can just drop the arena and not re-use the arena's logical address when it's all freed.
Cheers,
Posted Oct 16, 2024 17:31 UTC (Wed)
by Tobu (subscriber, #24111)
[Link] (4 responses)
Posted Oct 16, 2024 18:40 UTC (Wed)
by tialaramex (subscriber, #21167)
[Link] (3 responses)
You can't prove me wrong because the only difference in (2) is that it's possible to get a different answer, but never required, so, I can tell you that you just got unlucky and that's why it seems as though I implement (1).
So the rational thing to do is implement (1) which is at least explainable.
Posted Oct 19, 2024 19:41 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (2 responses)
(1) provides neither of those guarantees, so it is not equivalent to (2) no matter how lucky or unlucky I might get.
Posted Oct 19, 2024 19:46 UTC (Sat)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Oct 20, 2024 8:43 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link]
If you want your numbering to exclusively refer to a specific thing you once wrote somewhere you're going to need much bigger numbers, I recommend 128-bit UUIDs for this work, however, since we're talking among people, I would suggest that naming things, though it has some undesirable quirks, is preferable, that's why the "worst case" provenance model offered in Rust and the default provenance proposed for C is referred to as "PNVI-ae-udi" rather than some hard to distinguish UUID, when I typo'd that name earlier everybody still knew what I meant.
As to the substance: Useless "optimisations" have to be painstakingly eliminated so there's no reason to ask for them knowing that you won't use them. Yes it is likely that LLVM will decide they can "optimise" ptrA == ptrB in some way that's not helpful, and if they want to do that Rust will just emit the IR for ptrA.addr() == ptrB.addr() instead so that it gets the required semantics. Today it doesn't matter because (as I wrote) there's an LLVM bug and as a result LLVM can believe that for two machine integers A, B A != B but A - B = 0 which is nonsense and the LLVM devs know it's nonsense but it's a consequence of this sort of "optimisation".
Posted Oct 16, 2024 17:03 UTC (Wed)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (6 responses)
So the exact same code works for traditional pointers and for CHERI-style ARM MTE pointers, no problem!
Posted Oct 16, 2024 17:34 UTC (Wed)
by jrtc27 (subscriber, #107748)
[Link] (5 responses)
Posted Oct 16, 2024 18:13 UTC (Wed)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (4 responses)
Please note that the LIFO push algorithm described in the paper has been in use since at least the 1980s, and quite likely the 1970s. Therefore, if CHERI fails to support it, I will consider that to be a bug in CHERI rather than a bug in LIFO push. ;-)
Posted Oct 17, 2024 6:39 UTC (Thu)
by jrtc27 (subscriber, #107748)
[Link] (3 responses)
Posted Oct 17, 2024 23:58 UTC (Thu)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (2 responses)
If you can say, what is the performance of the CAS(A)(L) instructions that handle both the address and the capability compared to those that handle only the address?
Posted Oct 18, 2024 0:41 UTC (Fri)
by jrtc27 (subscriber, #107748)
[Link] (1 responses)
Posted Oct 18, 2024 16:36 UTC (Fri)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
With that performance issue mostly out of the way, one big remaining issue is the memory occupancy. Interesting tradeoff, the overhead of double-size pointers vs. the overhead of killer static analysis. ;-)
Posted Oct 16, 2024 18:04 UTC (Wed)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link]
Posted Oct 15, 2024 18:37 UTC (Tue)
by iabervon (subscriber, #722)
[Link]
IIRC, this was at the level of whether the JVM needed some memory barriers that were relatively expensive, rather than in formalizing the Java language. I think it ended up with them not having a way to promise programmers that OOTA couldn't happen by formal specification, but the runtime has to prevent it anyway in order to provide security properties to the system owner.
Posted Oct 16, 2024 6:22 UTC (Wed)
by HIGHGuY (subscriber, #62277)
[Link] (4 responses)
If that compare and swap checked provenance too, then it would fail (because A!=C, even if addresses are the same) and the loop would retry.
Then it’s up to the compiler to prove that, under this model, the variant that doesn’t perform the additional loop (because pointers are values after all and most architectures don’t track provenance) is equivalent to the one that does. An architecture with CHERI for example, is then expected to perform that additional loop because the pointers values don’t compare equal.
Posted Oct 16, 2024 17:09 UTC (Wed)
by PaulMcKenney (✭ supporter ✭, #9624)
[Link] (3 responses)
In other words, things like ARM MTE can reduce the probability of ABA, but the cannot eliminate it.
Note that defining cmpxchg() to always account perfectly for provenance is a no-go because often the provenance is unknown. Yes, there are tools such as Miri that might be able to figure it out, but we might not want such tools to be on the normal edit-build-run cycle.
Posted Oct 17, 2024 13:57 UTC (Thu)
by DemiMarie (subscriber, #164188)
[Link] (2 responses)
Posted Oct 17, 2024 15:35 UTC (Thu)
by intelfx (subscriber, #130118)
[Link] (1 responses)
How is this supposed to be tracked (in a way that does not amount to a GC implementation, for obvious reasons)?
Posted Oct 18, 2024 22:01 UTC (Fri)
by DemiMarie (subscriber, #164188)
[Link]
Provenance
Provenance
Provenance
Provenance
Provenance
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
2. Pointer comparisons, including atomic compare-exchange, are agnostic to or ignorant of provenance (i.e. pointers with non-overlapping provenances may compare equal, and the compiler or architecture may then proceed to treat them differently anyway).
3. Pointer atomics are practically useful for implementing lock-free data structures.
2. == is transitive, so this also expands provenance non-locally (i.e. if we previously know that x == y, and we later discover that y == z, then we must expand provenance to x even though it may be in an entirely different part of the program). That makes escape analysis very hard.
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
1. Pointers with non-overlapping provenance may compare as equal or unequal. Such a comparison has no effect on provenance. This probably also makes atomics unworkable without laundering, but at least now we're not looking at nasal demons.
2. Pointers with non-overlapping provenance must compare as unequal, but the implementation may choose to arbitrarily launder any pointer at any time, in which case it may compare as equal to any other pointer to an object at the same address. In other words, the compiler may arbitrarily pretend that a pointer has been laundered wherever it so pleases, but it must be consistent. Once the implementation launders a pointer, that pointer must stay laundered for future operations (but not past operations - the implementation is explicitly *not* required to "go back in time" and launder any other pointers that were ever compared with a given pointer, so my earlier concern about the transitivity of == would not apply here). Believe it or not, I think this probably can be implemented on CHERI-with-honest-pointer-comparisons (as described above), simply by configuring the compiler to never optimize the result of a pointer comparison to anything other than unequal (if you can prove that the provenances do not match, then they compare unequal, otherwise let the hardware sort it out, meaning that it returns true on regular hardware and false on CHERI, and that's OK because both results will cause the program to behave acceptably on their respective platforms). This would possibly require some additional tracking on the part of the compiler to ensure that it is consistent about its reasoning with respect to provenance, but I don't see this as unduly burdensome. We're not asking the compiler to solve the halting problem or anything, just don't insert optimization X unless you have already applied optimization Y to the same code.
3. Pointer comparisons must fully respect provenance. Pointers must compare equal if and only if they have the same provenance (or at least overlapping, as in the case of array element pointers) and are numerically equal. This is the most correct behavior, but I suspect it defeats some optimizations and/or would require compilers to explicitly track provenance at runtime (on non-CHERI hardware), both of which are less than ideal.
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
2. We can statically prove that the pointers do not have the same address, or have non-overlapping provenances. Fold it to false.
3. We can statically prove that if the pointers have the same address, then they also have overlapping or identical provenances. Emit a comparison instruction.
4. We can't statically prove any of the above. Insert provenance barriers for both operands, which only apply prospectively to these exact variables (i.e. calling functions don't have to know we did this unless we return one of the pointers, in which case we were already in the business of more complicated escape analysis anyway), and then emit a comparison instruction.
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
Wol
See this comment by Ralf Jung working out the options: your -1 preferred, 1 in LLVM would have been worked around by casting the pointers to usize before doing the comparison. Final decision is that pointer comparison ignores provenance.
Pointer comparisons ignore provenance
There are two consistent options for pointer comparisons in Rust (given that they have to be safe):
Pointer comparisons ignore provenance
Pointer comparisons ignore provenance
Pointer comparisons ignore provenance
Pointer comparisons ignore provenance
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
C++ already gives us a tool for pointer zapping
OOTA was a real issue in Java around 2000
Provenance checks
Provenance checks
Provenance checks
Provenance checks
Capability GC