An introduction to lockless algorithms
Lockless algorithms are of interest for the Linux kernel when traditional locking primitives either cannot be used or are not performant enough. For this reason they come up every now and then on LWN; one of the last mentions, which prompted me to write this article series, was last July. Topics that arise even more frequently are read-copy-update (RCU — these articles from 2007 are still highly relevant), reference counting, and ways of wrapping lockless primitives into higher-level, more easily understood APIs. These articles will delve into the concepts behind lockless algorithms and how they are used in the kernel.
it takes a special kind of mind to really understand the memory model". It's been said that the Linux kernel memory model (and in particular Documentation/memory-barriers.txt) can be used to frighten small children, and the same is probably true of just the words "acquire" and "release".
At the same time, mechanisms like RCU and seqlocks are in such widespread use in the kernel that almost every developer will sooner or later encounter fundamentally lockless programming interfaces. For this reason, it is a good idea to equip yourself with at least a basic understanding of lockless primitives. Throughout this series I will describe what acquire and release semantics are really about, and present five relatively simple patterns that alone can cover most uses of the primitives.
Acquire, release, and "happens before"
In order to make the big step from the (relative) comfort of synchronization primitives to lockless programming, we shall first take a look at why locks work in the first place. Usually, this is taught in terms of mutual exclusion: locks prevent multiple threads of execution from reading or writing the same data concurrently. But what does "concurrently" really mean? And what happens when thread T is done with that data and thread U starts using it?
In order to answer these questions, we can turn to a theoretical framework that Leslie Lamport established in his 1978 paper "Time, Clocks and the Ordering of Events in a Distributed System". According to the paper, the events in a distributed system can be ordered according to whether an event P happens before another event Q:
- The ordering of events is total within a single thread of execution. In layman's terms, for any two events from the same thread you can always say which came first and which came second.
- If two events do not happen within a single thread of execution, event P happens before event Q if event P is a message send and event Q is the corresponding message receive.
- The relation is transitive. Therefore, if event P happens before event Q and event Q happens before event R, then event P happens before R.
The "happens before" relation is a partial ordering: it is possible to have two events P and Q such that neither happens before the other. When this happens, the two events are concurrent. Remember how a lock prevents concurrent accesses to the same data structure? That is because, when you protect a data structure with a lock, all accesses to the data structure form a total ordering, as if they came from a single thread. Lamport's framework also provides a basic idea of what happens when a lock is handed off from a thread to another: some kind of "message passing" ensures that the unlock operation of thread T "happens before" the lock operation of thread U.
As it turns out, this is not just theory: in order to ensure that their caches are coherent, CPUs exchange messages over buses such as Intel's QPI or AMD's HyperTransport. However, this level of detail is definitely too much for our purposes. Instead, we will generalize the "happens before" definition to cover any kind of synchronization primitive.
Lamport's fundamental insight was that synchronization happens when two threads operate with symmetric operations on the same data structure. In our generalization, we will list pairs of operations (such as sending and receiving a message through the same queue) that synchronize one thread with another. Furthermore, we will classify the operations in each pair as either release or acquire; equivalently we can say that they "have release (or acquire) semantics".
Within a pair, the release operation synchronizes with the corresponding acquire operation. "Synchronizing" means that, whenever one thread performs a release operation and another thread performs the corresponding acquire operation, the "happens before" relation grows edges from the releasing thread to the acquiring thread. "Happens before" remains a partial order in the general case, but it now spans two threads, or more than two, thanks to transitivity. More formally:
- The ordering of operations is total within a single thread.
- If an operation P with release semantics synchronizes with an operation Q with acquire semantics, then operation P happens before operation Q, even if they happen in different threads.
- Just like before, the relation is transitive and defines a partial ordering of operations.
The old definition follows by giving release semantics to message send and acquire semantics to message receive. A message send synchronizes the sending thread with the thread (or threads) that receives the message. We can also rephrase our previous findings according to this new definition: for locks to work, unlocking must have release semantics and must synchronize with locking—which in turn has acquire semantics. No matter if the lock is contended or not, the resulting "happens before" edges ensure a smooth hand-off from one thread to the other.
Acquire and release semantics may seem like an abstract concept, but they truly provide simple explanations of many common multithreaded programming practices. For example, consider two user-space threads accessing a global variable s:
thread 1 thread 2 -------------------------------- ------------------------ s = "hello"; pthread_create(&t, NULL, t2, NULL); puts(s); s = "world"; pthread_join(t, NULL); puts(s);
Are the accesses to the variable safe? Can thread 2 assume that s will read as "hello", and likewise can thread 1 assume that s will be "world" after pthread_join()? The answer is affirmative, and we can explain why in terms of acquire and release semantics:
- pthread_create() has release semantics and synchronizes with the start of thread 2 (which has acquire semantics). Therefore, anything written before a thread is created can be safely accessed from the thread.
- Exiting thread 2 has release semantics and synchronizes with pthread_join() (which has acquire semantics). Therefore, anything the thread writes before exiting can be safely accessed after pthread_join().
Note that data is flowing from one thread to the other with no lock involved: congratulations, you have made it through the first example of lockless programming. To sum up:
- If the programmer wants thread 2 to "see the effects of" everything that happened so far in thread 1, the two threads need to synchronize with each other: this is done with a release operation in thread 1 and an acquire operation in thread 2.
- Knowing which APIs provide acquire/release semantics lets you write code that relies on the ordering provided by those APIs.
Having understood how release and acquire semantics work for high-level synchronization primitives, we can now consider them in the context of individual memory accesses.
The message-passing pattern
In the previous paragraph we have seen how the acquire and release semantics of pthread_create() and pthread_join() allow the creator of a thread to exchange information with that thread and vice versa. We will now see how this kind of communication can happen in a lockless manner while threads run.
If the message is a simple scalar value, for example a boolean, it could be read and written directly to a memory location. However, consider what happens if the message is a pointer, as in the following example:
thread 1 thread 2 -------------------------------- ------------------------ a.x = 1; message = &a; datum = message; if (datum != NULL) printk("%d\n", datum->x);
If message is initially NULL, thread 2 will read either NULL or &a, we don't know which. The problem is that, even if
datum = message;
were to read &a, that assignment is still not synchronized against the assignment in thread 1:
message = &a;Therefore, there is no edge in the happens before relation connecting the two threads:
a.x = 1; datum = message; | | | happens before | v v message = &a; datum->x
Because the two threads of execution are disconnected, there is still no guarantee that datum->x will read as 1; we don't know that the assignment to a.x happens before that read or not.. For this to happen, the store and load must be endowed with release and acquire semantics respectively.
To that end, we have the "store-release" and "load-acquire" operations. A store-release operation P, in addition to writing to a memory location, synchronizes with a load-acquire operation Q if Q reads the value that was written by P. Here is a fixed version of the above code, using Linux's smp_store_release() and smp_load_acquire():
thread 1 thread 2 -------------------------------- ------------------------ a.x = 1; smp_store_release(&message, &a); datum = smp_load_acquire(&message); if (datum != NULL) printk("%x\n", datum->x);
With this change, if datum is &a we can affirm that the store happened before the load. (I am assuming for simplicity that only one thread can write &a to message. Saying "thread 2 reads the value written by thread 1" does not refer to the specific bit pattern that goes in memory, it really means that thread 1's store is the last whose effect is visible to thread 2). The relation looks like this now:
a.x = 1; | v smp_store_release(&message, &a); -----> datum = smp_load_acquire(&message); | v datum->x
And everything works. Because of transitivity, whenever thread 2 reads the value written by thread 1, everything that thread 1 did up to the store-release will also be visible to thread 2 after the load-acquire. Note that, unlike the pthread_join() case, "synchronizes with" does not mean that thread 2 "waits for" thread 1 to do the write. The above drawing only applies if thread 2 happens to read the value that thread 1 has written.
In the Linux kernel, the above code will often be written in a slightly different manner:
thread 1 thread 2 -------------------------------- ------------------------ a.x = 1; smp_wmb(); WRITE_ONCE(message, &a); datum = READ_ONCE(message); smp_rmb(); if (datum != NULL) printk("%x\n", datum->x);
In this case, the release and acquire semantics are provided by the memory barriers smp_wmb() and smp_rmb(). Memory barriers also have acquire and release semantics, but they are a bit more complicated to reason about than simple loads and stores. We will get back to them when we talk about seqlocks.
Regardless of whether one uses load-acquire/store-release or smp_rmb()/smp_wmb(), this is an extremely common pattern, and one that should be understood well. Among its uses we find:
- All sorts of ring buffers. Each entry of the ring buffer often points to other data; usually, there are also head/tail locations that contain an index in the ring buffer. The producer side will use store-release operations, synchronizing with load-acquire operations in the consumer.
- RCU. As far as the compiler is concerned, the familiar rcu_dereference() and rcu_assign_pointer() APIs are similar to load-acquire and store-release operations. Thanks to some assumptions that are true for all processors except the Alpha, rcu_dereference() can be compiled to a regular load; still, rcu_assign_pointer() synchronizes with rcu_dereference() as if it were a load-acquire operation.
- Publishing pointers into an array. In this (modified) KVM snippet, if
kvm_get_vcpu() sees the incremented kvm->online_vcpus,
the associated entry in the array will be valid:
kvm_vm_ioctl_create_vcpu() kvm_get_vcpu() ----------------------------------------- ----------------------------------------------- kvm->vcpus[kvm->online_vcpus] = vcpu; if (idx < smp_load_acquire(&kvm->online_vcpus)) smp_store_release(&kvm->online_vcpus, return kvm->vcpus[idx]; kvm->online_vcpus + 1); return NULL;
Apart from the mechanics of the load-acquire/store-release operations, there is another aspect of the message-passing pattern that you should ponder: it is a single producer algorithm. If there are multiple writers, they must be protected against each other by other means, for example with a mutex. Lockless algorithms do not exist in a void; they are but one part of the concurrent programming toolbox, and they work best when combined with other, more traditional tools.
This is just the beginning of an extended series on lockless algorithms.
The next installment will look further at how atomic memory
operations can be ordered, and look into how memory barriers are at
the heart of both the "seqcounts" mechanism and the Linux scheduler.
Index entries for this article | |
---|---|
Kernel | Lockless algorithms |
GuestArticles | Bonzini, Paolo |
Posted Feb 19, 2021 22:12 UTC (Fri)
by randomguy3 (subscriber, #71063)
[Link] (1 responses)
Posted Feb 21, 2021 14:37 UTC (Sun)
by Sesse (subscriber, #53779)
[Link]
Posted Feb 20, 2021 0:05 UTC (Sat)
by pebolle (guest, #35204)
[Link] (29 responses)
Posted Feb 20, 2021 0:12 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link] (2 responses)
"The ordering of events is total within a single thread of execution. In layman's terms, for any two events from the same thread you can always say which came first and which came second."
Posted Feb 20, 2021 0:25 UTC (Sat)
by pebolle (guest, #35204)
[Link] (1 responses)
and
> The ordering of operations is total within a single thread.
Totally clear.
Posted Feb 20, 2021 7:21 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link]
1) you can always say which critical section came first and which came second (lock waits for the other critical section to complete, if there's one), and
2) accesses within a critical section come from a single thread, and there is always a total order (defined in the article) within a thread.
Posted Feb 20, 2021 2:32 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (22 responses)
This is opposed to a partially ordered set. A good example of a partially ordered set is a genealogical tree. From the information within the tree you can tell that some people are older than other (a father is always older than a son) but for some cases there's not enough information (e.g. you can be younger or older than your cousin).
Posted Feb 20, 2021 9:34 UTC (Sat)
by pebolle (guest, #35204)
[Link] (21 responses)
That helps.
I'm mathematically challenged. So, for example, I found it very annoying in programming exercises that recursion would always be taught with Fibonacci numbers. Who cares about that? It wasn't until I once had to walk through directories and filenames that recursion started to make sense to me.
Posted Feb 21, 2021 11:20 UTC (Sun)
by ballombe (subscriber, #9523)
[Link] (20 responses)
Posted Feb 21, 2021 12:13 UTC (Sun)
by pebolle (guest, #35204)
[Link] (9 responses)
I just double checked - I was afraid that my memory let me down again - but recursion does work for Fibonacci. At least, that's what the web and the only introductory programming book I have on my shelves tells me.
Perhaps we disagree on the meaning of "recursion" and "work"?
Posted Feb 21, 2021 12:29 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (6 responses)
Which will blow your stack for large values of f :-)
I hope I've got it right ... :-)
Cheers,
Posted Feb 21, 2021 13:08 UTC (Sun)
by dtlin (subscriber, #36537)
[Link]
The problem with the recursive solution is the exponential runtime; the stack usage is linear.
Posted Feb 21, 2021 15:24 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
The start of the conditional should be
if (f <= 2) return 1;
Cheers,
Posted Feb 21, 2021 16:16 UTC (Sun)
by yann.morin.1998 (guest, #54333)
[Link]
Not necessarily. The generalised Fibonacci sequence starts with F(0) = 0 and F(1) = 1 [0]. So your initial condition was (not in)correct.
Posted Feb 21, 2021 17:10 UTC (Sun)
by matthias (subscriber, #94967)
[Link]
Posted Feb 21, 2021 23:41 UTC (Sun)
by neilbrown (subscriber, #359)
[Link]
Only if you try to implement it on a computer.
Posted Mar 5, 2021 0:13 UTC (Fri)
by qzPhUZ7af5M6kjJi3tDgMaVq (guest, #140267)
[Link]
https://wiki.haskell.org/The_Fibonacci_sequence
In a Haskell interpreter:
Prelude> let fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
(20,899 digit number omitted... it took about 2 seconds to compute it)
Posted Feb 22, 2021 9:49 UTC (Mon)
by ballombe (subscriber, #9523)
[Link] (1 responses)
Posted Feb 22, 2021 10:04 UTC (Mon)
by pebolle (guest, #35204)
[Link]
Posted Feb 22, 2021 11:47 UTC (Mon)
by jezuch (subscriber, #52988)
[Link] (9 responses)
Posted Feb 22, 2021 12:14 UTC (Mon)
by Wol (subscriber, #4433)
[Link] (8 responses)
Cheers,
Posted Feb 24, 2021 23:49 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link] (7 responses)
Of course, if you *really* want to motivate memoization, you probably want to talk about something more interesting, like Hashlife. I remember reading a lengthy series of articles on Eric Lippert's blog where he spends much of the series painstakingly squeezing out every ounce of performance from the iterative approach (vectorization, bit-twiddling, lookup tables, etc.), and then he switches to Hashlife ("Gosper's algorithm"), with hardly any weird optimizations at all beyond basic algorithm design and memoization, using *garbage collected objects* to represent everything, and suddenly the whole thing is orders of magnitude faster* for large-enough N. My conclusion: Using the right algorithm is far, far more effective than heavily optimizing the wrong algorithm.
* Strictly speaking, it is faster when solving the problem "Figure out what the board will look like at generation N." If the problem is instead "Figure out what the board will look like at every generation from 1 through N," then Hashlife is much less effective. So using it for an interactive UI that sequentially displays one generation at a time is probably not a great idea. But if you want to advance directly to generation 2^27 or something like that, Hashlife is basically the only game in town.
Posted Mar 2, 2021 20:07 UTC (Tue)
by jezuch (subscriber, #52988)
[Link] (6 responses)
Fibonacci numbers are defined by a recursive function which has an exact solution, so you can calculate every number in O(1) time (assuming no bignums are needed). So it's doubly (or triply) a bad example to intoduce recursion to new programmers ;)
Posted Mar 5, 2021 3:16 UTC (Fri)
by dtlin (subscriber, #36537)
[Link] (5 responses)
Posted Mar 11, 2021 9:56 UTC (Thu)
by anselm (subscriber, #2796)
[Link] (4 responses)
If your platform only supports integers up to a certain maximum (say 2^64-1), it is O(1) because it is straightforward to create a lookup table for all n where F(n) < 2^64 and use that rather than the recursive definition.
Posted Mar 11, 2021 19:28 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link] (3 responses)
If you strict permissible inputs to a predetermined finite range then every algorithm becomes O(1). That's not a very useful definition, though. And while a 93-entry table suffices to represent all Fibonacci numbers less than 2^64, many languages have built-in arbitrary-precision integer types limited only by available memory.
Posted Mar 12, 2021 10:49 UTC (Fri)
by anselm (subscriber, #2796)
[Link] (1 responses)
Obviously. I think the point here is that Fibonacci numbers are often used as a prime example for the use of recursion, but in practice, calculating Fibonacci numbers through a 1:1 implementation of their recursive definition is not what one would actually ever do. As such it gives people silly ideas. It would be better to come up with examples where there are more compelling arguments for using recursion in the first place, or at the very least put a big health warning on the Fibonacci example.
Posted Mar 13, 2021 8:54 UTC (Sat)
by neilbrown (subscriber, #359)
[Link]
Posted Mar 16, 2021 20:49 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Feb 21, 2021 21:31 UTC (Sun)
by thoughtpolice (subscriber, #87455)
[Link] (2 responses)
For example, many people define division as a function that takes two parameters, 'n' and 'd', and computes n / d. But most people would agree that d=0 is an invalid parameter your program will actually fault. It's nonsensical.[1] So division is not total, then: you can give it an input and it can fail. But multiplication is total: it doesn't matter what two integers you give it. You can always multiply them, and there is always a sensible result. Thus, being "total", or "totality", is a property of algorithms/functions/actions you can perform on some objects (those objects being the set of all possible parameters, commonly called "the input domain".)
There is another operator that you often use when programming, and it is called "is x less than y", or <. You use it to figure out how things are ordered. Is 1 less than 2? It turns out, this is very similar to the question "does a thing happen before another thing?" and it's a very important question. They are both a way of asking what order things occurred in, just for different kinds of "things". But, as an algorithm, ordering is not always total: sometimes you just can't tell if A happened before B, or if A is less than B, or not. It might not make sense for the objects in question.
I'll also say that partiality and totality, and especially partially ordered sets (sometimes called "posets"), are an extremely, extremely fundamental component of computer science and mathematics. They're very important, and very well worth understanding, even if it's only at a high level.
[1] It's only nonsensical to some people or computers. On some CPUs, such as RISC-V for example, division by zero is well defined, as n/0=0. This is fine, and not really problematic. At the end of the day, math is what human beings make of it. For rigorous mathematics, n/0=0 might be heresy. But for computers, for humans using them, it's maybe OK to let that slide.
Posted Feb 25, 2021 1:47 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
* Git history is a poset (under the relationship "commit X is an ancestor of commit Y").
In short: A poset is a thing that lets you have some objects come "logically before" other objects, for whatever definition of "logically before" is appropriate.
Then, a totally ordered set is just a poset with the additional requirement that for every X and Y, either X comes before Y or Y comes before X (or X = Y). In the DAG interpretation, this is what you would get if you require that the DAG forms a simple chain A -> B -> C ... -> Z, without any branching or whatnot allowed. So, to continue our examples:
* Git history is totally ordered, if you never create any branches or merges, never use detached HEAD mode, and never create a second root commit (i.e. a commit without parents, other than the first commit in the repository).
Posted Feb 26, 2021 21:07 UTC (Fri)
by thoughtpolice (subscriber, #87455)
[Link]
Posted Feb 20, 2021 0:12 UTC (Sat)
by mss (subscriber, #138799)
[Link] (2 responses)
It is worth noting that the Linux kernel memory model is very similar (or one can say: based on) C11/C++11 memory model - after all, both were based on the underlying behavior of real-world computer systems.
However, since the C/C++ working groups had the benefit of designing an atomic synchronization API from scratch the resulting API seems cleaner and more consistent for me than the kernel API.
I also find Mark John Batty's PhD dissertation called "The C11 and C++11 Concurrency Model" a very good read on the subject.
The aforementioned dissertation is available here: https://www.cs.kent.ac.uk/people/staff/mjb211/docs/toc.pdf
Posted Feb 20, 2021 7:11 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
Posted Feb 20, 2021 16:11 UTC (Sat)
by mss (subscriber, #138799)
[Link]
By the way, you said in a comment under the July article to steer clear of the C11 "sequentially consistent" loads and stores since they are incredibly easy to get wrong.
For me personally the most surprising thing with C11 atomics is that a relaxed operation on a variable plus a barrier of X memory order (before or after that operation) is not the same thing as just performing the same operation but with the X memory order, even though the semantics are very close.
In general, the C11 atomics seem to be tailored more for using atomic operations of particular memory order rather than barriers, while most of the kernel code seems to prefer using ordinary memory accesses with barriers instead.
Posted Feb 20, 2021 0:18 UTC (Sat)
by pebolle (guest, #35204)
[Link] (54 responses)
I'm still dumb: what makes this insightful?
Posted Feb 20, 2021 0:56 UTC (Sat)
by gus3 (guest, #61103)
[Link] (48 responses)
"For any op <, =, >, if A op B, and B op C, then A op C." That is to say:
-- If A < B, and B < C, then A < C
Referring to the statement you quoted, it could be re-stated thus:
"Given an action A, the time tA at which action A takes place, and any op <, =, >:
That may seem thick, but instead of just A, B, and C, I've just substituted locations on a time-line instead of a number line.
It may seem obvious, and it kind of is. But stating applicable axioms explicitly is standard practice in rigorous maths discussions.
Posted Feb 20, 2021 2:55 UTC (Sat)
by excors (subscriber, #95769)
[Link] (5 responses)
That's contrary to the everyday human concept of time, where you would say every event happens at a precise number of seconds past midnight and you can always work out which happened first. That concept is problematic to implement in computers, because it's simply impractical for physically separated processors to have perfectly synchronised nanosecond-precision clocks and to attach timestamps to every message they share. So we throw away that concept of time and time-lines and define a brand new concept (called "happens before", but could equally be called "op" or "→") from scratch.
The axioms are trying to define that new concept with the minimal set of requirements on where there must be a well-defined ordering between events, so it's just barely strong enough to achieve the goal (in this case to define some useful kind of synchronisation between concurrent threads). And in this particular case, transitivity is one property that is required else the mathematical proofs stop working, so it has to be stated explicitly in the definition.
Posted Feb 20, 2021 18:12 UTC (Sat)
by Wol (subscriber, #4433)
[Link] (3 responses)
Whether two things are sequential or not depends on whether one occurs INSIDE the light-cone of the other. These acquire-release semantics force the two light cones to intersect, so anything after the intersect is visible inside both light cones so you can conclude they are sequential.
And the light-cone thing also explains why things can only be simultaneous if they occur at the same 4-dimensional co-ordinates, because otherwise they are not in each other's light cone, and therefore there is no mutual concept of time.
Cheers,
Posted Feb 20, 2021 19:14 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
For example think of two cars going from A to B along two different roads. Any time a junction joins the two roads you *might* be able to say who is ahead at that moment. However while the roads are running parallel it's possible (but not certain) that you cannot say that. That is because the points along the two routes form a partial ordering.
Posted Feb 21, 2021 12:36 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (1 responses)
The send-receive creates a new light cone for which the origin is visible in both the previous two cones :-)
Cheers,
Posted Feb 21, 2021 13:00 UTC (Sun)
by pebolle (guest, #35204)
[Link]
Posted Feb 23, 2021 1:06 UTC (Tue)
by rgmoore (✭ supporter ✭, #75)
[Link]
From a practical standpoint, it's not just acceptable for there to be no ordering; it's actually desirable for performance. If you can divide up your program into separate tasks that don't have ordering constraints between them, you can run those tasks however makes the best performance sense. This has been the key to all kinds of modern performance advances, like out of order execution, multiprocessing, and what have you. Those things all depend on being able to break up tasks into parts that can be run in different orders. It's the need to synchronize that slows things down.
Posted Feb 21, 2021 6:22 UTC (Sun)
by tialaramex (subscriber, #21167)
[Link] (41 responses)
Not explicitly stating such an assumption can get you in serious trouble if we want to prove things about a system.
For example: As an IoT developer thinking about how two devices you manufacture can securely communicate over an IP network, you might seize upon the Pre-shared Keys (PSKs) in TLS 1.3 (RFC 8446) as a good approach that avoids worrying about certificates or a PKI. It might seem like a common sense assumption that if device #1 ("Alice"), and device #2 ("Bob") share the same key K, that's going to be secure against any attacker that doesn't know K, the document (as published) doesn't suggest there would be any unexpected problems. And indeed mathematical proofs of TLS 1.3 say it does deliver on the claims in its introduction.
Unfortunately those proofs assume something different from you. They assume your two devices would each need their own key, K1 and K2. If you don't use separate keys you need countermeasures not described in the protocol, because of the "Selfie attack". The attack goes like this, Alice asks Bob something (using key K of course), but Mallory (who doesn't know K) just redirects Alice's message back to Alice. Alice thinks this question came from Bob (after all it uses key K which they share) and answers it, now Alice also thinks that answer came from Bob. Mallory has successfully confused Alice despite being unable to either read or forge messages from either Alice or Bob. Under the proof scenario Alice asks using K1, when the message is redirected back to her Alice is puzzled because it isn't encrypted with key K2 from Bob, so that message is rejected and everything is fine.
If this assumption had been made into an explicit _axiom_ that could have highlighted that the TLS document doesn't explain the problem with just having a single key K and likely its authors would have drafted a section explaining why you might need more keys than seem necessary at first glance, and what alternative strategies would work if you must have one key.
A more broadly famous missing assumption is the Axiom of Choice. In the 1800s mathematicians were confident they understood the Theory of Sets - they had established Axioms stating how it works - but alas they'd actually without making it explicit been assuming that in a set of sets (like the set of sets of integers) you can always choose one of the members of each set. This is definitely true for finite sets. But for infinite sets we can't prove it, so, we ought to explicitly make it an Axiom and say so, if we're wrong any results depending on that assumption may be wrong too. Today most mathematicians will choose ZFC, axioms for set theory which include the Axiom of Choice, because it does after all _feel_ intuitively as though you could choose one from each set. But at least you've written down this specific assumption you're making in case it matters some day.
Posted Feb 21, 2021 12:25 UTC (Sun)
by ale2018 (guest, #128727)
[Link] (40 responses)
Thus, among "existing" things you have defined a partial time order. Can one obtain relativity from Newtonian physics by just introducing information exchange?
Posted Feb 21, 2021 12:47 UTC (Sun)
by Wol (subscriber, #4433)
[Link] (39 responses)
Almost certainly not. Newtonian Physics assumes time IS constant from all vantage points - if you see time move ten seconds so will I. Relativity says time APPEARS constant from all vantage points.
If we assume I travel to Alpha Centauri and back at 0.9c, you on Earth will see me take about 9 years to do so. Newton says that means I must experience the same 9 years. But relativity says time will run slower for me, I may only experience 1 year passing.
It's like a multi-core CPU with each core having a different clock ... and time is measured in ticks ...
I don't see how exchanging information will alter anything.
Cheers,
Posted Feb 22, 2021 13:04 UTC (Mon)
by ale2018 (guest, #128727)
[Link] (38 responses)
For one thing, introducing information and the ability to process it breaks time symmetry. You can easily call "past" the side you remember and "future" the side you predict. Time reversal doesn't work any more. At this point you've already left the classical, Newtonian vision, and dropped those misconceptions about time.
Special relativity was introduced as an attempt to account for the speed of light being a universal constant. As a speed limit, it entails that information cannot travel faster than light either. My question was whether it's possible to derive some kind of relativity the opposite way around. That is, assuming that things exists only if you can exchange information with them, and then assuming the partial ordering principle outlined in the article, is it possible to derive a concept similar to the light cone?
Of course, it is not possible to compute the actual speed of light based on pure logic... Or is it?
Posted Feb 22, 2021 13:46 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (37 responses)
AFAIR, it's not possible to compute a one-way speed at all. You can only get a round-trip speed (light could be traveling at some factor based on its angle with a "universal axis" and we'd never be able to tell) because you can't communicate back about any time synchronization without being affected by the same limits. So, no, without an axiom that "the universe has no preferred axis for light travel", you get an infinite number of possible solutions for a speed-of-light gradient.
Posted Feb 22, 2021 18:44 UTC (Mon)
by ale2018 (guest, #128727)
[Link]
That's right.
Let me quote Leslie Lamport, from the 1978 article cited by Paolo Bonzini:
The references [1] and [2] are two introductory texts on relativity of those times.
Posted Feb 23, 2021 0:03 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (31 responses)
Oh, and I believe it IS possible to calculate the speed of light based on pure logic ...
You'll have to look it up for yourself, but if you take a bunch of mathematical constants like e, pi, ln and so on, I think they somehow dictate a set of possible values for c, g, and other physical constants. It's too long ago, but I'm sure I've seen the physical constants defined in terms of the mathematical constants. And that's why astronomers are so unhappy with the cosmological constant. Because they can't define it in terms of anything else, it shouldn't exist ...
Cheers,
Posted Feb 23, 2021 0:38 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (26 responses)
It'd also lay a blow to various theories of multiple universes which differ in the physical constants. Also, what is a "fundamental constant"? Is the radius of a proton a constant or a derived value? The size of a helium-4 nucleus? Ratio between the lepton sizes? Even physics is a *bit* arbitrary here.
Posted Feb 23, 2021 1:19 UTC (Tue)
by Wol (subscriber, #4433)
[Link] (25 responses)
Who said anything about fundamental? Both e and pi are mathematical constants, as is the golden ratio. They're all of equal importance.
Things like c, g, permittivity, that stuff, are all physical constants, which should be explainable in terms of maths.
Posted Feb 23, 2021 13:10 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (7 responses)
But…why? What is this assumption based upon? Is it a bit of Platonic idealism?
Posted Feb 24, 2021 0:27 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (6 responses)
Cheers,
Posted Feb 24, 2021 21:58 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (5 responses)
If we wanted, we could say that the speed of light is pi/e Wol_lengths/Wol_time intervals. Doesn't mean that those units are any more useful than our numerical value in m/s.
Posted Feb 25, 2021 7:48 UTC (Thu)
by jem (subscriber, #24231)
[Link] (4 responses)
Posted Mar 16, 2021 21:09 UTC (Tue)
by nix (subscriber, #2304)
[Link] (3 responses)
Posted Mar 17, 2021 10:37 UTC (Wed)
by jem (subscriber, #24231)
[Link]
Well, I said derived, not defined.
Posted Mar 18, 2021 10:14 UTC (Thu)
by mgedmin (subscriber, #34497)
[Link] (1 responses)
a cubic *deci*meter (which is also one liter) of water.
Posted Mar 20, 2021 1:24 UTC (Sat)
by nix (subscriber, #2304)
[Link]
(And obviously you're right, unless water has suddenly become much, much denser than lead :) )
Posted Feb 23, 2021 15:24 UTC (Tue)
by rschroev (subscriber, #4164)
[Link] (16 responses)
Posted Feb 24, 2021 0:23 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (1 responses)
I think the equation is G=g.m1.m2/d^2
G is 9.8m/s^2, while g is the gravitational constant, which is believed to be the same everywhere in the universe.
Cheers,
Posted Feb 24, 2021 0:47 UTC (Wed)
by rschroev (subscriber, #4164)
[Link]
Posted Feb 24, 2021 0:31 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (11 responses)
It can *de*scribe it, yes. But I think you'll find it also *pre*scribes it and says "completely different constants and laws doesn't make logical sense".
Cheers,
Posted Feb 24, 2021 1:01 UTC (Wed)
by rschroev (subscriber, #4164)
[Link] (10 responses)
BTW there's a video of Richard Feyman's lecture about that relationship here: https://youtu.be/obCjODeoLVw
Interesting stuff, if you're interested in that kind of stuff.
Posted Feb 24, 2021 18:28 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (9 responses)
Cheers,
Posted Feb 24, 2021 21:54 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (3 responses)
Posted Feb 24, 2021 23:22 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (2 responses)
We think it may be four - that's what Einstein thought but that has a whole bunch of problems. If it *is* four, I believe that means string theory is correct as it's the explanation for relativistic singularities.
Or it could be ten or eleven. Anything betwen five and nine inclusive just doesn't work because we get an explosion of infinities - infinity itself isn't a problem, but there are different sorts of infinity and for reality to work they need to cancel out. For those dimensions they don't. (These universes, if I remember correctly, define mass as the fifth dimension ...)
(That's why I was moaning about computers crashing when you divide by zero. If you declare zero and infinity as non-numbers for which arithmetic doesn't work, you're in trouble. If you say "to make arithmetic work, they swap places on division", then you can do this sort of maths and come up with something that makes sense.)
At the end of the day, we have loads of maths that describes what we see. And that *constrains* what is a plausible universe. We have a local maximum or minimum, don't know which. By adjusting some values, we can force others to impossible values. Plausible universes must have all these values at maximum or minimum, not off the scale or impossible or at some non-equilibrium value.
Cheers,
Posted Feb 24, 2021 23:36 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 25, 2021 0:05 UTC (Thu)
by nybble41 (subscriber, #55106)
[Link]
To say that division by zero is undefined is mathematically correct and not just an arbitrary computer limitation. There is no proper answer to the question "What number, when multiplied by zero, gives the non-zero product X?", at least not in any system that would uphold basic idioms such as the product of a number and its reciprocal being equal to one. "Infinity" times zero is not equal to any particular finite number X, so that isn't a solution. Depending on the particular forms of the equations which gave rise to the infinity and the zero (or infinitesimal) the product could be another infinity or any real number; it depends on how you phrase the question. $\lim_{x \to 0+} x ln \frac{1}{x} = 0$, but $\lim_{x \to 0+} x \frac{1}{x} = 1$. In both cases you're multiplying an infinitesimal ($\lim_{x \to 0+} x$) by an infinity ($\lim{x \to 0+} ln \frac{1}{x}$ or $\lim{x \to 0+} \frac{1}{x}$, respectively) but the specific form of the equation changes the result.
Posted Feb 24, 2021 22:24 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (4 responses)
If you want to read something mind-blowing, try the "Clockwork Rocket" series by Greg Egan. It's accompanied by a thesis-sized exploration of its (fictional) physics: http://www.gregegan.net/ORTHOGONAL/ORTHOGONAL.html
Posted Feb 24, 2021 23:26 UTC (Wed)
by Wol (subscriber, #4433)
[Link] (3 responses)
Cheers,
Posted Feb 24, 2021 23:34 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link]
You can also construct a quantum field theory for such a universe, it also would work just fine.
Posted Feb 25, 2021 15:16 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Mar 8, 2021 14:59 UTC (Mon)
by LtWorf (subscriber, #124958)
[Link]
Physics tries to model measurements.
So for example you measure a planet that is orbiting, make an equation, and see if tomorrow the equation and the position are the same (within a certain range of precision).
Before Galileo saw that Jupiter had satellites, they had perfectly fine equations that predicted where everything would be in the sky. The problem arose because new data could not fit the model.
You can absolutely model an orbit of a planet using 3 dimensions, or you can model it in an higher space with an equation of a lower degree. Both work. We can't really know which is "exact" if both work.
You can keep adding dimensions and make equations that work, but we don't really know what the "truth" is.
Posted Feb 24, 2021 1:24 UTC (Wed)
by SiB (subscriber, #4048)
[Link] (1 responses)
Posted Feb 24, 2021 8:53 UTC (Wed)
by mpr22 (subscriber, #60784)
[Link]
The first formally adopted definition for the metre itself, proposed in 1791 by the French Academy of Sciences and adopted in 1793 by the National Assembly, was one ten-millionth of the distance from the North Pole to the Equator along the Paris meridian.
Posted Feb 23, 2021 1:06 UTC (Tue)
by mpr22 (subscriber, #60784)
[Link] (3 responses)
There is one particular physical constant, µ_0 (the permeability of free space) which was historically defined (as a consequence of the relationship between electrical current and induced magnetic fields) as exactly 4π × 10^-7 H/m ... but that value is no longer strictly valid (although it's close enough for the vast majority of practical purposes), because the ampere has been redefined in terms of the elementary charge and the second.
Posted Feb 24, 2021 1:00 UTC (Wed)
by SiB (subscriber, #4048)
[Link] (2 responses)
Some years ago in our coffee room at the institute there was a discussion about the possibility that the fine-structure constant α may not have been constant in the past after all. The question was raised, which of the constituents of α=e²/(4πε₀ħc) may be the reason. Well, the answer is e, if at all. α is a constant without units. It is pure physics. It defines the strength of electromagnetic interactions. The universe cares a lot about the value of the fine-structure constant α. In our experimental units it is the charge of the electron e that represents that strength, but in the end, it also just defines units of measure.
Theoretical physics defines the units differently than experimental physics. They mostly use c=4πε₀=ħ=k=1. Everything is measured in eV or eV¯¹.
From a theoretical point I am a bit unhappy about the new definition of the SI units. They fixed the value of e, although it indirectly represents a quantity the universe cares about. The cost is that µ₀ and ε₀ are now _not_ fixed by definition in the SI, as they were before, like Wol said.
Ideally, all units were defined by fixing the value of some fundamental constant. We are now almost there. The second is still defined by an artifact. At least it is an atom, so it can be reproduced everywhere in the universe. To fix that we need to fix the gravitational constant. But metrology defines units by what they can measure with the best precision. The gravitational constant does not apply, it has to catch up at least eight orders of magnitude in precision.
Posted Feb 25, 2021 3:46 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
The universe may not care about the speed of light, but the stuff in the universe surely does. To borrow from your "theoretical physics normalizes everything to 1" example, the mass of the proton is much, much (numerically) smaller than the charge of the proton, and surely there's some physical significance of that? Or, alternatively, we can say that there's nothing wrong with the proton, and instead it's gravity which is too weak, but that's just reframing the same question in different units. I'm not sure you can ask that question without using units at all.
Ultimately, I suppose the "proper" framing of this question is some nonsensically complicated question about why the relevant quantum fields happen to interfere in exactly the way that they do to give the proton the properties that it has. But you can't even ask that question until you know what gravity is, at a quantum level, and to the best of my knowledge, nobody does.
Posted Feb 25, 2021 4:35 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (3 responses)
Ultimately, science is about predicting the outcomes of experiments and observations. If you can't observe it, directly or indirectly, it is beyond the realm of scientific inquiry. The one-way speed of light falls into this hole. Speed is normally defined as distance divided by time, but when we're talking about relativistic speeds, we need to be clear about which time we mean.
So, let's imagine a specific setup. We'll send a pulse of light from point A to point B. So, what is the one-way velocity of the light? It is the distance between A and B, divided by the time it takes for the signal to arrive. But whose time?
It turns out that, not only is there no reasonable way for the scientists at A and B to communicate the fact that the pulse has arrived (short of sending a second pulse and thereby measuring the round-trip time instead), it is actually not meaningful for an event at A and an event at B to happen "simultaneously." There will always be an observer who thinks that A's event happened first, and an observer who thinks that B's event happened first, unless the events are in each others' light cones, in which case everyone will agree that one comes before the other. But you can't get everyone to agree that both events happen simultaneously.
This creates a problem, because no matter where you put the clock, it has to be synchronized with respect to two distant events:
1. The pulse being sent from A.
You cannot synchronize a clock with respect to an event unless the clock is physically present at that event, because simultaneity is otherwise relative to one's choice of reference frame. So the clock has to be at both A and B. You could imagine two clocks, synchronized with each other, but then you have to decide what you mean by "synchronized," because simultaneity is still relative. There is always a reference frame in which the clocks are not synchronized.
If the pulse were traveling slower than light, then you could move the clock with the pulse. This would give you a metric called the "proper time," and that actually does have some useful properties in this context. In particular, it is independent of reference frame or choice of coordinates, so that the one-way velocity of any slower-than-light object is perfectly well-defined and even measurable. But you can't do that with a pulse of light, because the pulse is traveling at the speed of light, and the clock can't go that fast if it has any mass. If the clock doesn't have any mass, then it cannot function as a clock in the first place, because massless particles do not experience the passage of time.
As a result, there is no operational definition of the one-way speed of light that is compatible with the laws of physics. This is why Einstein, for example, describes it as a "convention" in his special relativity paper, rather than asserting it as fact. It's not fact, it's just a convenient way of labeling the diagram.
Having said all that, there is one thing we can experimentally verify (and have done so): The time it takes for light to travel along a closed inertial path of length L is L/c, as measured at the start/endpoint of the path, regardless of what the path looks like or how it is shaped. So, if there are any variations in the one-way speed of light that depend on angle, they *always* cancel each other out. As a result, if you simply assume that the one-way speed of light is a constant, then you can use that assumption to "synchronize" distant clocks by sending light pulses between them and adjusting for the speed of light delay. Although this "synchronization" is ultimately just a convention, it behaves a lot like you would expect "true" synchronization to behave (at least with respect to the clocks' reference frames, anyway). Einstein describes how this works in his paper, and others have elaborated on the details (in short, the clocks should be in inertial reference frames and not in relative motion).
Posted Feb 25, 2021 22:39 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link] (2 responses)
Could you entangle two clocks?
Posted Feb 25, 2021 22:47 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Posted Feb 25, 2021 22:56 UTC (Thu)
by apoelstra (subscriber, #75205)
[Link]
[1] https://en.wikipedia.org/wiki/Twin_paradox
Posted Feb 20, 2021 4:04 UTC (Sat)
by itsmycpu (guest, #139639)
[Link] (4 responses)
As long as the value read in step3 is the value written in step2 (or a value written after that). Because then you know that step3 > step2, and because of that, step4 > step1.
Posted Feb 20, 2021 6:15 UTC (Sat)
by itsmycpu (guest, #139639)
[Link] (3 responses)
To clarify the not-so-obvious part:
From the point of view of the first thread, step2 > step1 is always true, but without using acquire/release, this doesn't transfer to the second thread. Without that, the second thread may not see step4 > step1.
Why not? For example, the compiler is allowed to produce code for step2 that affects memory before step1, as long as that doesn't change the outcome of the first thread's local computations. The compiler does not have to consider the possible side effect on the second thread, which therefore might see the memory affected by step2 before it is affected by step1.
Only the use of acquire/release requires the compiler to produce code where the update of step1 is written to memory, and therefore visible to other threads, before step2. (Memory or shared cache). So without that, the second thread cannot assume step4 > step1, even if it thinks it can assume step3 > step2.
I think this is the same thing as the article says, just highlighting the practical aspects.
Posted Feb 20, 2021 7:24 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
Posted Feb 20, 2021 9:47 UTC (Sat)
by pebolle (guest, #35204)
[Link]
And that goes for everyone taking time to answering my grumpy questions.
When you're not versed in the vocabulary and habits of a certain field it is very to hard determine whether you're reading something profound, something trivial or just plain nonsense. (I've just finished a book of nearly 300 pages explaining that even physicists can be happily employed while apparently only producing hot air.)
I'll restart the article and hope that I won't get stuck on one of my many, many pet peeves again.
Posted Dec 10, 2021 9:48 UTC (Fri)
by Lurchi (guest, #38509)
[Link]
Posted Feb 20, 2021 3:47 UTC (Sat)
by kurogane (subscriber, #83248)
[Link] (9 responses)
I'm afraid I keep on interpreting the message example as being code that will only read datum->x once x was already assigned.
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
The point is that recursion *does not* work for computing Fibonacci numbers...
An introduction to lockless algorithms
An introduction to lockless algorithms
{
if (f <= 1)
return f;
else
return fibonacci( f-1) + fibonacci( f-2);
}
Wol
An introduction to lockless algorithms
An introduction to lockless algorithms
Wol
An introduction to lockless algorithms
> if (f <= 2) return 1;
An introduction to lockless algorithms
An introduction to lockless algorithms
As a mathematical abstraction, this is perfectly fine and well defined for all f.
An introduction to lockless algorithms
Prelude> fibs !! 100000
An introduction to lockless algorithms
12586269025 additions. That is what I call 'not working'.
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
Wol
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
It has a closed-form solution but that does not make it O(1).
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
* The packages in a (sufficiently powerful) package manager are a poset (under the relationship "package X is a transitive dependency of package Y").
* In languages with OOP and inheritance, the set of classes (or classes and interfaces) is a poset (under the relationship "class/interface X is a direct or indirect subclass/subtype of class/interface Y").
* Packages are pretty much never totally ordered, in real-world cases. But maybe, if you wanted to ensure a totally deterministic installation order, you could insert artificial dependencies to force the package manager to install everything in the exact order you want.
* In languages with single inheritance, each individual class's set of superclasses (not counting interfaces) is totally ordered. But the language as a whole is not, in most reasonable cases.
An introduction to lockless algorithms
An introduction to lockless algorithms
It not only includes a more formal description of the model but is also rich with examples describing how various CPU architectures behave and why particular model design decisions were (probably) made.
An introduction to lockless algorithms
An introduction to lockless algorithms
I assume you mean that people have misconceptions about the ordering guarantees that sequentially consistent semantics give?
After all, they provide acquire and release semantics, too, so any algorithm that's valid under acq/rel will be valid under sequentially consistent semantics, too (just maybe slower).
An introduction to lockless algorithms
An introduction to lockless algorithms
-- If A = B, and B = C, then A = C
-- If A > B, and B > C, then A > C
If tP op tQ, and tQ op tR, then tP op tR."
An introduction to lockless algorithms
An introduction to lockless algorithms
Wol
An introduction to lockless algorithms
An introduction to lockless algorithms
Wol
An introduction to lockless algorithms
An introduction to lockless algorithms
It's perfectly acceptable for there to be no ordering between them (except where the axioms say there must be an ordering).
Axioms
Axioms
Axioms
Wol
Axioms
Axioms
Axioms
This definition will appear quite natural to the reader
familiar with the invariant space-time formulation of
special relativity, as described for example in [1] or the
first chapter of [2]. In relativity, the ordering of events is
defined in terms of messages that could be sent. However,
we have taken the more pragmatic approach of only
considering messages that actually are sent. We should
be able to determine if a system performed correctly by
knowing only those events which did occur, without
knowing which events could have occurred.
Axioms
Wol
Axioms
Axioms
Cheers,
Wol
Axioms
Axioms
Wol
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
Wol
Axioms
g is the gravity of earth, approximately 9.81 m/s².
The equation is F = G⋅m1⋅m2/r². In case of Earth, that's F = G⋅m⋅m_earth/r_earth², with g = G⋅m_earth/r_earth². So in that case we get F = m⋅g.
Axioms
Wol
Axioms
And some comments about it: https://medium.com/cantors-paradise/richard-feynman-on-th...
Axioms
Wol
Axioms
Axioms
Wol
Axioms
You can construct a universe with multiple time axes, with many more dimensions, and so on. The math will be internally consistent.
Axioms
Axioms
Axioms
Wol
Axioms
Who told you that? A six-dimensional classic (Newtonian) universe works just fine. Sure, you won't have stable orbits but apart from that it's OK.
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
Axioms
2. The pulse arriving at B.
Axioms
Axioms
Axioms
An introduction to lockless algorithms
An introduction to lockless algorithms
>
> As long as the value read in step3 is the value written in step2 (or a value written after that). Because then you know that step3 > step2, and because of that, step4 > step1.
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
a.x = 1; datum = message;
| |
| happens before |
v v
message = &a; datum->x
I think the left "happens before" arrow should not be there (the right one of course is correct, due to the data dependency).
An introduction to lockless algorithms
thread 1 thread 2
-------------------------------- ------------------------
a.x = 1;
message = &a; datum = message;
if (datum != NULL)
printk("%d\n", datum->x);
That is that thread 2 will do nothing if "message" and hence "datum" pointer is still null, or receive a pointer to "a" after it already has its x member assigned.
Is the point that thread 2 was supposed to printk(), i.e. wait until datum is non-null? When I first saw this code it seemed natural to me that thread 2 was meant to be a 'just do nothing if object was not ready' thread though.
Or is the point there might be reordering, that CPU instruction-wise the stores and loads might not be not ordered as the C code above is?
Posted Feb 20, 2021 7:17 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (8 responses)
In theory the reordering could be on both sides, in practice (except on the Alpha) only the release side might be assigning a.x after message. Even on non-Alpha processors you need to inform the compiler not to do any reordering.
However you could have for example "ready=1" on the producer side and "if (ready) printk("%x", a.x)" on the consumer side. In that case both sides need to be protected against reordering at the processor level (in addition to the compiler level).
Posted Feb 20, 2021 8:58 UTC (Sat)
by kurogane (subscriber, #83248)
[Link] (6 responses)
Thanks Paolo. If the processor instructions don't necessarily follow the higher-level language flow shown in the example, then it I get it I believe. Well, could you confim? The instruction flow in the processor could end up like the below, as the C programmer mentally pictures it? (Focusing just on thread 1 as that is the more likely jeopardy-maker.) I think I finally understand what memory barriers are - temporal ones, not spatial.
Posted Feb 20, 2021 9:13 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (5 responses)
On one hand it may not happen on all processors, and details may vary depending on the microarchitecture. On x86 that reordering cannot happen, on ARM and PowerPC it might.
But on the other hand the compiler might always be reordering stuff behind your back. The big insight of the C++ committee was that the same abstraction could cover both processor-level and compiler-level reorderings.
Posted Feb 20, 2021 11:04 UTC (Sat)
by kurogane (subscriber, #83248)
[Link] (1 responses)
I appreciated this confirmation very much.
Posted Feb 20, 2021 20:31 UTC (Sat)
by danobi (subscriber, #102249)
[Link]
Posted Feb 21, 2021 19:13 UTC (Sun)
by khim (subscriber, #9252)
[Link] (2 responses)
Tiny correction: on x86 it can and will happen. You need memory barriers there, too.
After reading the aforementioned article I was confused for some time: memory barriers arrived on x86 with SSE… yet SMP was supported for more than decade by that time! Some investigation showed that any instruction with
Posted Feb 21, 2021 19:16 UTC (Sun)
by khim (subscriber, #9252)
[Link]
Posted Feb 21, 2021 20:51 UTC (Sun)
by pbonzini (subscriber, #60935)
[Link]
Posted Mar 20, 2021 19:18 UTC (Sat)
by guzh (subscriber, #140551)
[Link]
Posted Feb 21, 2021 12:38 UTC (Sun)
by ale2018 (guest, #128727)
[Link] (1 responses)
If pointers are 64 bit, does that statement imply that the cache is being updated by minimal chunks of that size? I mean, why cannot it read the lower part of &a?
Posted Feb 21, 2021 12:53 UTC (Sun)
by pbonzini (subscriber, #60935)
[Link]
The processor itself will load data from memory in cache lines that will usually be 32, 64 or 128 *bytes* long.
Posted Feb 21, 2021 18:50 UTC (Sun)
by andresfreund (subscriber, #69562)
[Link] (3 responses)
Posted Feb 21, 2021 20:52 UTC (Sun)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
Posted Feb 21, 2021 22:29 UTC (Sun)
by andresfreund (subscriber, #69562)
[Link] (1 responses)
I don't know how many times I've now spent several hours staring at the read barrier + write barrier != full barrier thing swapping it back into my brain. And how often I couldn't find anything good to tell people to read.
Posted Feb 25, 2021 18:25 UTC (Thu)
by Alan.Stern (subscriber, #12437)
[Link]
Example: Suppose the source code says:
write(x)
Then the CPU has to execute the write before the wmb, and it has to execute the read after the rmb. But (depending on the architecture) nothing prevents it from reordering the operations to:
rmb()
(In fact this actually happens on x86, where rmb and wmb are both no-ops and reads can be executed before earlier writes.)
However, with a full barrier:
write(x)
no reordering is possible, and the write *must* be executed before the read.
Posted Feb 21, 2021 21:10 UTC (Sun)
by thoughtpolice (subscriber, #87455)
[Link] (1 responses)
Posted Feb 26, 2021 0:37 UTC (Fri)
by jschrod (subscriber, #1646)
[Link]
Yes, and that's why the book "Concurrency: the Works of Leslie Lamport" was published by ACM in October 2019. https://dl.acm.org/doi/book/10.1145/3335772
There are very few computer scientists who are as important, where ACM published such a book. That said, I'm still waiting on the book "the works of Don Knuth" that goes beyond his work on TACP and TeX and also values his introduction of attribute grammars and loads of other principles. His recent opinion peace https://dl.acm.org/doi/10.1145/3442377 in CACM about the way computing history should be researched is also worth a read, even and especially if one doesn't agree with his approach.
Posted Mar 3, 2021 6:46 UTC (Wed)
by comicfans (subscriber, #117233)
[Link]
Posted Mar 3, 2021 22:57 UTC (Wed)
by prauld (subscriber, #39414)
[Link]
Posted Mar 7, 2021 14:58 UTC (Sun)
by jcm (subscriber, #18262)
[Link] (1 responses)
I would also personally encourage readers to look elsewhere than the x86 QPI/UPI and HT stuff and toward an openly documented protocol with public documentation of its implementation. The Arm AMBA specs are available, as is their implementation in the Coherent Mesh Network CMN IP. Far more useful than the Intel/AMD alternatives.
Posted Mar 7, 2021 14:59 UTC (Sun)
by jcm (subscriber, #18262)
[Link]
Posted Mar 7, 2021 15:16 UTC (Sun)
by jcm (subscriber, #18262)
[Link]
1. Happens before can be thought of as "if it happens". When there are two threads sharing a variable, and there is order through barriers or acquire/release, you're still checking the value of the shared variable. No thread waits for another, it's just that *if* you see an update to the shared variable, you're guaranteed that you also see all that came before. This article is quite good at pointing that out, but many others don't really highlight this part.
2. The difference between the smp_store_release/smp_load_acquire and the WRITE_ONCE/READ_ONCE with barriers is that the latter uses "full" barriers while the former can be thought of as using "half" barriers that only order in one direction for each half of the acquire/release pair, which allows certain other unrelated loads and stores to pass into/out of the critical section doing the shared variable update.
Posted Mar 25, 2021 12:22 UTC (Thu)
by naveenmv7 (guest, #141627)
[Link] (1 responses)
Why is this true? Can't pthread_join be implemented using semaphores internally?
Posted Apr 6, 2021 15:56 UTC (Tue)
by pbonzini (subscriber, #60935)
[Link]
Posted Oct 11, 2022 4:50 UTC (Tue)
by dafnaf (subscriber, #158251)
[Link]
An introduction to lockless algorithms
An introduction to lockless algorithms
thread 1 thread 2
-------------------------------- ------------------------
message = &a; ...
...
a.x = 1; ...
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
> On x86 that reordering cannot happen, on ARM and PowerPC it might.
An introduction to lockless algorithms
lock
prefix acts as a memory barrier on x86. But if there are not memory barriers at all… yes, it does happen on x86, too.
After looking on that article again I have realized that Jeff added addendum to the end which reveals all the gory details about An introduction to lockless algorithms
lock
and xchg
. So there are no mystery anymore…
An introduction to lockless algorithms
An introduction to lockless algorithms
And only if the load *actually* happens after the store, we can say "a.x = 1" happens before "datum->x".
(If thread 2 runs really fast and completes before thead 1 starts, it sure won't get the proper value in x.)
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
wmb()
rmb()
read(y)
read(y)
write(x)
wmb()
mb()
read(y)
There are other ways in which a full barrier is stronger than the combination of a read barrier plus a write barrier -- but at least this gets the point across.
Using Lamport's approach when reasoning about concurrency is very powerful. It's much more rigorous and applicable than you might expect, and it can explain many behaviors easily at a glance that you might struggle over. For instance, in a lockless memory allocator, you need to grab a pointer from a freelist head and claim it for the caller (in the implementation of An introduction to lockless algorithms
alloc
), and then you need to update the freelist pointer to point to the next free block before you're done (very simplified).
Therefore the allocation is split into two distinct actions. The system however, may preempt your thread after a block is claimed, but before the freelist head is updated. Then another thread could also allocate and erroneously pull the same block, which can't be allowed.
The traditional, simplistic solution to this is to pack a counter value together with every freelist address. When you perform any operation on the freelist, including claiming blocks or updating pointers, you use a CAS2 operation ("Compare-and-swap, for two words of memory") to both update a value, and also increment the counter by one next to it. You can then check the counter value against a known value, to check preemption. If the counter value is g
at the beginning and no thread interrupted you, then the result you get back at the end of everything, when you read the counter, should be g+1
, i.e. no other threads performed any actions between your own, and so what you did was seen as atomic. If it's any other value, you've been preempted, the address you have in your hand is claimed by someone else, and you must start over.
But this can all be stated another way: every object needs a lamport counter associated with it. By examining the counter, you can determine if you are responsible for a casual action identified by the counter n
, as well as n+1
. If you are in fact responsible for both of these actions, then n
by definition occurred before n+1
, and thus no other threads were able to see the indeterminate state of the object. In this way, a Lamport counter is a kind of concrete algorithm for ensuring atomic broadcast amongst threads.
Frankly I consider "Time, Clocks, and Ordering of Events in a Distributed System" is a mandatory text for any modern software engineer. Actually, practically every text written by Lamport is extremely important to read...
An introduction to lockless algorithms
Really worth to read.
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms
An introduction to lockless algorithms