Betrayed by a bitfield
One of the many structures used by the btrfs filesystem, defined in fs/btrfs/ctree.h, is:
struct btrfs_block_rsv { u64 size; u64 reserved; struct btrfs_space_info *space_info; spinlock_t lock; unsigned int full:1; };
Jan Kara recently reported that, on the ia64 architecture, the lock field was occasionally becoming corrupted. Some investigation revealed that GCC was doing a surprising thing when the bitfield full is changed: it generates a 64-bit read-modify-write cycle that reads both lock and full, modifies full, then writes both fields back to memory. If lock had been modified by another processor during this operation, that modification will be lost when lock is written back. The chances of good things resulting from this sequence of events are quite small.
One can imagine that quite a bit of work was required to track down this particular surprise. It is also not hard to imagine the dismay that results from a conversation like this:
Unsurprisingly, Linus was less than impressed by this response. Language standards are not written for the unique needs of kernels, he said, and can never "guarantee" the behavior that a kernel needs:
But sometimes compilers do stupid things. Using 8-byte accesses to a 4-byte entity is *stupid*, when it's not even faster, and when the base type has been specified to be 4 bytes!
As it happens, the problem is a bit worse than non-specified behavior. Linus suggested running a test with a structure like:
struct example { volatile int a; int b:1; };
In this case, if an assignment to b causes a write to a, the behavior is clearly buggy: the volatile keyword makes it explicit that a may be accessed from elsewhere. Jiri Kosina gave it a try and reported that GCC is still generating 64-bit operations in this case. So, while the original problem is technically compliant behavior, it almost certainly results from the same decision-making that makes the second example go wrong.
Knowing that may give the kernel community more ammunition to flame the GCC
developers with, but it is not necessarily all that helpful. Regardless of
the source of the problem, this behavior exists in versions of the compiler
that, almost certainly, are being used outside of the development community
to build the kernel. So some sort
of workaround is likely to be necessary even if GCC's behavior is fixed.
That could be a bit of a challenge; auditing the entire kernel for 32-bit-wide
bitfield variables in structures that may be accessed concurrently will not be a
small job. But, then, nobody said that kernel development was easy.
Index entries for this article | |
---|---|
Kernel | GCC |
Posted Feb 2, 2012 4:20 UTC (Thu)
by arjan (subscriber, #36785)
[Link] (3 responses)
Posted Feb 2, 2012 4:43 UTC (Thu)
by nevets (subscriber, #11875)
[Link] (2 responses)
"We refuse to build your kernel with the current version of your compiler. Linus has declared that this version of gcc violates SOPA (Stop Obnoxious Programming Acts). Although the gcc developers feel that this bug is a feature, we have proof that they are wrong."
Posted Feb 2, 2012 9:55 UTC (Thu)
by spaetz (guest, #32870)
[Link]
Posted Feb 9, 2012 17:51 UTC (Thu)
by jd (guest, #26381)
[Link]
Posted Feb 2, 2012 5:43 UTC (Thu)
by ncm (guest, #165)
[Link] (8 responses)
My bet is that he knows the code is bad, but is covering his embarrassment by haranguing the Gcc developers, on the (sound!) general principle that they always deserve abuse, regardless of the immediate facts. Gcc developers are the candles whose unsteadiness we blame for causing us to stumble in the darkness.
Posted Feb 2, 2012 20:05 UTC (Thu)
by dashesy (guest, #74652)
[Link] (3 responses)
Posted Feb 2, 2012 20:46 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 2, 2012 21:57 UTC (Thu)
by dlang (guest, #313)
[Link] (1 responses)
Posted Feb 3, 2012 6:50 UTC (Fri)
by ncm (guest, #165)
[Link]
Posted Feb 3, 2012 22:58 UTC (Fri)
by cavok (subscriber, #33216)
[Link] (3 responses)
Posted Feb 3, 2012 23:02 UTC (Fri)
by dlang (guest, #313)
[Link] (1 responses)
if you can even replace two int variables with one bitfield, you may shrink your datastructure enough to have it fit within a CPU cache line. The memory access that can then be avoided can be enough to make a noticeable difference.
Posted Feb 6, 2012 13:32 UTC (Mon)
by cavok (subscriber, #33216)
[Link]
Posted Feb 6, 2012 12:22 UTC (Mon)
by etienne (guest, #25256)
[Link]
Posted Feb 2, 2012 11:39 UTC (Thu)
by abacus (guest, #49001)
[Link] (1 responses)
Posted Feb 3, 2012 6:54 UTC (Fri)
by ncm (guest, #165)
[Link]
Posted Feb 2, 2012 13:57 UTC (Thu)
by Lovechild (guest, #3592)
[Link] (3 responses)
Posted Feb 2, 2012 16:31 UTC (Thu)
by darthscsi (guest, #8111)
[Link] (2 responses)
struct example {
int foo(struct example* e) {
%struct.example = type { i32, i8 }
define i32 @foo(%struct.example* nocapture %e) nounwind uwtable {
Posted Feb 2, 2012 16:33 UTC (Thu)
by darthscsi (guest, #8111)
[Link] (1 responses)
0000000000000000 <foo>:
Posted Feb 2, 2012 16:47 UTC (Thu)
by khim (subscriber, #9252)
[Link]
This looks like x86-64 assembler to me, not IA64. Are you even sure we are talking about the same case? x86-64 works just fine with GCC, too. PowerPC, Sparc and IA-64 (which started the thread) work incorrectly...
Posted Feb 2, 2012 19:13 UTC (Thu)
by jwakely (subscriber, #60262)
[Link] (1 responses)
Posted Feb 2, 2012 22:21 UTC (Thu)
by jan.kara (subscriber, #59161)
[Link]
Posted Feb 3, 2012 17:43 UTC (Fri)
by daglwn (guest, #65432)
[Link] (54 responses)
> But sometimes compilers do stupid things. Using 8-byte accesses to a
The compiler may have no choice. I can imagine a machine where the only access size is 64 bits or greater. Insane? I imagine the 6- and 16-bit processor guys said the same when Alpha appeared. If the hardware does not support a smaller granularity of access there is no choice but to do a RMW. This is why the C standard does not make the guarantees Linus wants it to make.
In this IA64 case it probably is a bug because AFAIK 32-bit accesses are allowed and no worse performance-wise than 64-bit accesses on all implementations. But I can also imagine implementations of IA-64 where a 32-bit access would be more expensive and thus a 64-bit access would make sense.
There really is no way to guarantee what Linus wants using standard C, so the kernel code is buggy. That's why compiler extensions abound. In this case, some of the GCC alignment directives could help.
But as compiler developers, we often have to fix bugs in user code where the developer expected unreasonable things. So we do it. :)
Posted Feb 3, 2012 22:01 UTC (Fri)
by daglwn (guest, #65432)
[Link] (50 responses)
To clarify, it's probably only a "bug" in a quality-of-implementation sense. The behavior is perfectly acceptable according to the C standard. 6.7.2.1 makes pretty much everything related to bitfields implementation-defined.
The ABI may define semantics more strictly and if gcc violates them it's a "real" bug. I don't have a copy of the IA-64 ABI so I don't know what the rules are for that platform.
The x86_64 ABI does place more restrictions on bitfield layout but technically there is nothing there that says the bitfield *must* be accessed though any particular type (i.e. load/store size). But quality-of-implementation concerns will lead the compiler writer to implement the semantics Linus expects for that platform and to not do so would be considered a bug.
Posted Feb 3, 2012 22:04 UTC (Fri)
by dlang (guest, #313)
[Link] (49 responses)
Posted Feb 3, 2012 22:50 UTC (Fri)
by daglwn (guest, #65432)
[Link] (48 responses)
Even for the non-bitfield case, it is technically not a bug (unless the ABI mandates something stricter) but is almost certainly a quality-of-implementation issue and for all intents and purposes would be considered a bug if the target platform has the appropriate kinds of loads and stores available.
Posted Feb 3, 2012 22:56 UTC (Fri)
by dlang (guest, #313)
[Link] (47 responses)
This is a direct violation of the spec.
note that these target platforms do have the appropriate loads and stores, and they aren't even slower than the 64 bit ones.
Posted Feb 3, 2012 23:07 UTC (Fri)
by daglwn (guest, #65432)
[Link] (45 responses)
Which spec? The IA-64 ABI? There is nothing in the C standard that mandates a specific allocation size for any member of a struct, much less alignment and access width.
I'm agreeing with you that it's a gcc bug, in that it may violate the ABI (I don't know) and almost certainly violates the standards of a quality implementation.
I'm simply pointing out that any code that relies on the layout of a struct or how its members are accessed is inherently non-portable. Fundamentally, from a language perspective, it is the kernel code that is buggy, assuming the kernel wants to be portable, which I think it does. :)
Posted Feb 4, 2012 0:34 UTC (Sat)
by dlang (guest, #313)
[Link] (43 responses)
remember, the developers are not demanding that the layout be a particular way, it's the compiler that's doing this.
restating my understanding
volitile int a;
and then making a change to b triggers a read/write cycle on a the compiler is doing the wrong thing (because the value of a may change between the read and write) and the programmer has explicitly told the compiler that a may change unexpectedly.
in the case that found the bug they were doing something like
int lock;
and they found that if one thread modified the value of bitfield while another thread modified lock, the modification to lock would get lost.
If the compiler can't do 32 bit loads and stores, then it needs to store these two variables in a way that they are padded so that the 64 bit lodes and stores won't loose modifications.
Posted Feb 4, 2012 14:20 UTC (Sat)
by nix (subscriber, #2304)
[Link] (41 responses)
The standard states (with wording identical in C99 and the just-released C11):
Thus it is not a violation of the (C99) Standard for accesses to a bitfield to unexpectedly issue write cycles to adjacent volatile bitfields for their current content as far as the serial abstract machine is concerned: indeed, on nearly all platforms this is unavoidable if the adjacent bitfields are located within the same byte. It would not be a violation of the Standard for an implementation to do a massive RMW cycle on all of memory every time a variable of any sort was accessed. These are both quality-of-implementation issues, not standards violations.
In C11, this has been fixed: as footnote 13 of the new section s5.1.2.4 puts it
Posted Feb 4, 2012 17:56 UTC (Sat)
by daglwn (guest, #65432)
[Link] (15 responses)
That would seem to be the "out" that gets around the problem of the target machine not having an appropriately-sized load/store instruction to avoid false sharing.
I've not studied the C11 standard yet. Is that the intent of this statement? Otherwise C11 structs with volatile members could be ABI-incompatible with source-equivalent C99 structs due to forced alignment to avoid placing volatile members next to other members in a single load/store access unit.
In other words, what happens if I have a volatile char next to a non-volatile char and my target only has 32-bit loads and stores?
Posted Feb 4, 2012 21:27 UTC (Sat)
by nix (subscriber, #2304)
[Link] (14 responses)
I'm not sure how you can implement these requirements without either ABI changes or an efficiency reduction (on platforms where sub-maximum-sized writes are inefficient). We might have to eat the efficiency loss: we surely cannot fix this by aligning every member of every structure to cacheline boundaries or something like that, that's ridiculous.
(But I am a neophyte in this area. Someone, anyone, who knows more than me, please chip in!)
Posted Feb 6, 2012 18:06 UTC (Mon)
by daglwn (guest, #65432)
[Link] (13 responses)
I think it's an "out" because the footnote only covers data in "separate memory locations." Two chars that don't cross an alignment boundary would be in the same "memory location" in some sense. They're in the same atomically-readable space.
But I don't know if that's the intent. If not, I think C11 has a serious issue. It will require ABI changes and that's not going to make people happy.
Posted Feb 6, 2012 18:38 UTC (Mon)
by chrisV (guest, #43417)
[Link] (11 responses)
Note that marking a struct or a field of a struct volatile is irrelevant to all this. This (§5.1.2.4) is a part of the C specification relating to access to an object of a given built-in type by multiple threads within a single process. Volatile is concerned only with asynchronous access to a given memory location by a single thread in a single process (for example, by an interrupt). Of course, if volatile were to work at all in the usage in the kernel, some prior cache synchronization would be required so approximating to what is wanted here. But even so, the compiler is not obliged to provide the behavior demanded by the kernel code in question.
Posted Feb 6, 2012 20:02 UTC (Mon)
by daglwn (guest, #65432)
[Link] (10 responses)
> But even so, the compiler is not obliged to provide the behavior demanded
Again, not being familiar with C11, what does the memory model give that can help multithreaded programming, then? How does §5.1.2.4/4 work in the case where the machine access granularity is larger than some primitive types and a RMW must happen?
Posted Feb 6, 2012 21:04 UTC (Mon)
by chrisV (guest, #43417)
[Link] (9 responses)
Discarding volatile, which is irrelevant, it requires the faulty behavior observed in the kernel not to occur. On 64 bit architectures it can do this by various pessimizations. It could use slower 32 bit accesses where available. If that is not available it may have to pad so that any 32-bit scalar occupies 64 bits of space, where necessary to conform to the standard.
None of this is new. gcc's pthreads implementations have been doing this for years. It might be instructive to look at the assembly output of the same test cases compiled with the -pthread switch.
Posted Feb 6, 2012 21:11 UTC (Mon)
by dlang (guest, #313)
[Link] (2 responses)
Posted Feb 6, 2012 21:32 UTC (Mon)
by chrisV (guest, #43417)
[Link] (1 responses)
Posted Feb 6, 2012 21:39 UTC (Mon)
by dlang (guest, #313)
[Link]
Posted Feb 7, 2012 0:43 UTC (Tue)
by daglwn (guest, #65432)
[Link] (5 responses)
Or pad 8 bits to 32, etc. This is what I mean by breaking ABI compatibility. If interfaces include struct values this could be a real nightmare.
Personally, I think this requirement is ridiculous absent some special attribute to enforce it. If there was a qualifier on a type to indicate "shared," _a_la_ UPC, that would be one thing. But to have this requirement for every single data item in the program is a very bad idea.
Posted Feb 7, 2012 8:43 UTC (Tue)
by khim (subscriber, #9252)
[Link] (4 responses)
Since C11 includes _Alignas and you can specify you own alignment it's not a disaster, albeit it is inconvenience. Perfectly working C++ program can already be broken by recompilation in C++11 mode, so it's not the first time upgrade broke things. Of course is only possible if original program violated specs and worked by accident (if you can show me genuinely different behavior in standards-compliant program it'll be interesting to know, too, but so far all examples I've seen contained subtle violations of one form or another).
Posted Feb 7, 2012 18:35 UTC (Tue)
by daglwn (guest, #65432)
[Link] (3 responses)
There's still an ABI problem if the compiler always has to align members to uphold the requirement.
> Perfectly working C++ program can already be broken by recompilation in
But as we all know, C++ is not C. :) I don't see this as a problem.
Posted Feb 7, 2012 20:53 UTC (Tue)
by khim (subscriber, #9252)
[Link] (2 responses)
C++ is quite explicitly not C, but C++11 pretends that it's still C++.
Posted Feb 7, 2012 23:29 UTC (Tue)
by daglwn (guest, #65432)
[Link] (1 responses)
Yes, you are correct, but I don't think there's an ABI issue. An ABI issue is much harder to deal with than a semantic change.
With an ABI issue you've got to recompile the world (your project, libraries it links to, etc.) to get a working application. With a semantic change you only recompile the bits that had to recoded to account for the change.
Posted Feb 8, 2012 8:54 UTC (Wed)
by khim (subscriber, #9252)
[Link]
YMMV, as usual. We recompile the world anyway, so ABI change is less of a problem, but the fact that just a recompilation does not fix the issue and you need to do a lot of investigations is a problem.
Posted Feb 6, 2012 18:40 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Feb 4, 2012 21:47 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (24 responses)
Maybe it's useful in read-only memory mapped I/O regions that just ignore writes.
Posted Feb 4, 2012 22:25 UTC (Sat)
by nix (subscriber, #2304)
[Link] (2 responses)
In general you can't (and never could) rely on 'volatile' without knowing what the compiler would do when you asked it: implementations have had horrible bugs in their treatment of volatile often enough that care is warranted. But at the very least, it provides a way to flag that 'something tricky is happening here, look out'. It's proved more useful than 'register', low bar though that is. :)
Posted Feb 4, 2012 22:37 UTC (Sat)
by giraffedata (guest, #1954)
[Link] (1 responses)
As long as you know that storing to your memory mapped I/O regions is a no-op, you can use standard C with volatile to usefully fetch from them, but otherwise you need more than standard C. You need common sense C. It just seems strange to me because writable I/O regions did exist at the time this accomodation for reading them was added to the spec.
Posted Feb 6, 2012 19:21 UTC (Mon)
by chrisV (guest, #43417)
[Link]
I absolutely don't agree with this. Forbidding 64-bit read/writes for 32-bit scalars within an aligned 64-bit boundary is a pessimization which should be opted into in C99, such as with the -pthread flag as in current gcc implementations (-pthread does require this pessimization if adjacent memory locations would otherwise be corrupted).
Presumably when C11 is implemented in gcc, opting out of this pessimization will also be available. Or possibly gcc will require some specific flag to be set where a multi-threaded program is being compiled, who knows.
Posted Feb 6, 2012 18:08 UTC (Mon)
by daglwn (guest, #65432)
[Link] (20 responses)
Not totally true. It's true that volatile doesn't do what most people expect. Basically, volatile says the data can't be cached in a register (because the unerlying value might change unexpectedly), and little else. But the prohibition on register allocation is a pretty big deal to the compiler.
Posted Feb 6, 2012 19:27 UTC (Mon)
by dlang (guest, #313)
[Link] (16 responses)
Posted Feb 6, 2012 20:01 UTC (Mon)
by chrisV (guest, #43417)
[Link] (4 responses)
This is orthogonal to the original problem, which (as far as I can work out) is concerned with concurrent access by different threads.
Posted Feb 6, 2012 20:06 UTC (Mon)
by dlang (guest, #313)
[Link] (3 responses)
As Linux says, it's possible for someone to fix the bug with respect to volatile, and explicitly leave the bug in for all other accesses (with the claim that the spec allows the compiler to do that), but it would probably be more work to do that then to just change the behaviour across the board.
Posted Feb 6, 2012 20:51 UTC (Mon)
by chrisV (guest, #43417)
[Link] (2 responses)
I don't believe that has been tested and I don't believe it to be true. On hardware where 64 bit boundaries are important, sig_atomic_t would be likely to be 64-bits wide. (Someone with a ia64 architecture might be able to confirm this.) In that case alignment boundaries would be fully respected.
Also, I can't see that the original problem had anything to do with interrupts. It seems to be concerned with either multi-threading, or shared access by multiple processes.
Lastly, when you refer to "the bug with respect to volatile", you impute that gcc has the bug. If there is a bug, it is a bug in the kernel.
Posted Feb 6, 2012 21:04 UTC (Mon)
by dlang (guest, #313)
[Link] (1 responses)
volitile int a;
b++;
and the resulting assembler did the exact same read-write on a that caused the problem with the case in question. If an interrupt were to happen between the read and the write that changed the value of a, the write would cause that value to be lost.
no, the item in question wasn't related to interrupts, but the cause of the problem causes the same problem if interrupts were involved.
If the 32 bit loads and stores were significantly more expensive to use than 64 bit loads and stores, there would be at least some reason to have this behaviour available, but in the case of the architectures in question there isn't a performance benefit to doing it this way.
Posted Feb 6, 2012 21:16 UTC (Mon)
by chrisV (guest, #43417)
[Link]
Not it doesn't. Memory access issues relating to multiple threads (or multiple processes in the case of shared memory) have nothing to do with interrupts, and nothing to do with volatile.
This really is beating a dead horse. Compiler switches are available to ensure memory consistency for the case in question, notably the -pthread switch. I can see why the kernel doesn't want to, or can't, use that. In that case the kernel authors need to write some assembler or use 64-bit values on architectures where it is important, or persuade the gcc developers to provide another compiler switch dealing with this particular problem.
Posted Feb 6, 2012 20:05 UTC (Mon)
by daglwn (guest, #65432)
[Link] (10 responses)
Posted Feb 6, 2012 20:13 UTC (Mon)
by dlang (guest, #313)
[Link] (9 responses)
again the code snippit is
volitile int a;
b++;
if modifying b causes a read/write of a, this is wrong.
the programmer has not made any attempt to specify alignment here.
Posted Feb 7, 2012 0:48 UTC (Tue)
by daglwn (guest, #65432)
[Link] (8 responses)
No, it's not. Believe me.
The volatile keyword doesn't say anything about when it will change value, be read/written etc. It says simply that it will not be cached in a register such that every read will get the "latest" value expected when executed under the Abstract Machine.
It says nothing about threading.
It says nothing about interrupts.
Simply remember that volatile is not magic. Think of it as the opposite of "register."
Posted Feb 7, 2012 1:03 UTC (Tue)
by dlang (guest, #313)
[Link]
since this sort of thing has been part of C's traditional strength, this doesn't seem like a sane interpretation to me.
Posted Feb 7, 2012 16:54 UTC (Tue)
by chrisV (guest, #43417)
[Link] (6 responses)
I think it does. See §5.1.2.3/5 and /10, which are normative. The principal purpose of volatile is to deal with arbitrary changes of data values outside the program context of the process in which the code is running (ie asynchronous interrupts). (This does not include threads, which are within the program context and which, because they can run on more than one core, require quite different synchronizations, some of which are not async-signal-safe.)
See also the footnote 134 of §6.7.3/8 (which is non-normative despite the "shall not"): "A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be 'optimized out' by an implementation or reordered except as permitted by the rules for evaluating expressions." This is a curious note as, as far as I am aware, it is the one and only reference to memory mapping (and about which I mis-spoke in an earlier posting on this article because it is not in C99 which contains no reference to memory mapping).
Posted Feb 7, 2012 18:38 UTC (Tue)
by daglwn (guest, #65432)
[Link] (5 responses)
Thanks for the correction. But the compiler is still correct here. Volatile doesn't say anything about restricting _when_ it is read or written, only that it will get the "latest" value in a single-thread context.
It's perfectly fine for I/O as long as you can guarantee alignment such that there is no "false sharing."
Posted Feb 7, 2012 19:07 UTC (Tue)
by chrisV (guest, #43417)
[Link] (4 responses)
Posted Feb 7, 2012 20:01 UTC (Tue)
by dlang (guest, #313)
[Link]
Posted Feb 7, 2012 23:32 UTC (Tue)
by daglwn (guest, #65432)
[Link] (1 responses)
No, there isn't.
There isn't. Really.
Volatile does not mean what you think it means.
It's a bit like sequential consistency. Just when you think you understand it, something unexpected happens that is both non-intuitive and perfectly legal.
Posted Feb 8, 2012 15:24 UTC (Wed)
by daglwn (guest, #65432)
[Link]
Posted Feb 8, 2012 13:51 UTC (Wed)
by nix (subscriber, #2304)
[Link]
Posted Feb 6, 2012 22:22 UTC (Mon)
by giraffedata (guest, #1954)
[Link] (2 responses)
I was commenting at a higher level. Never mind what the compiler can and can't do. The question is, what programs can you write in C because it has the "volatile" feature that you couldn't otherwise? Relying on nothing but that the compiler implements the C standard.
I've always understood the originally intended answer to be, "you can write a program that uses memory mapped I/O." But comments in this thread say regardless of how one uses the "volatile" keyword in a program, the compiler may always generate code that arbitrarily writes to memory mapped I/O addresses. There's nothing in the standard to stop it. If so, then you not only can't use memory mapped I/O in a C program, you can't even let a C program — any C program — run in an address space that includes memory mapped I/O regions.
Unless all memory-mapped I/O regions ignore writes.
And it's hard to believe that the definers of "volatile" actually had such a useless thing in mind.
Posted Feb 7, 2012 0:52 UTC (Tue)
by daglwn (guest, #65432)
[Link] (1 responses)
Volatile is perfectly fine for I/O as long as you know the address being accessed is suitably aligned to avoid problems, as the ABI should indicate.
This is why volatile is non-portable. Unfortunately, C99 has no standard way to force alignment of any object.
Posted Feb 7, 2012 8:46 UTC (Tue)
by khim (subscriber, #9252)
[Link]
GCC, MSCV and other compilers include such an ability and C11 finally adds it to standard so it's all is not so bad...
Posted Feb 4, 2012 17:47 UTC (Sat)
by daglwn (guest, #65432)
[Link]
If you need to protect data like this you must make use of constructs outside the C standard.
Posted Feb 4, 2012 13:57 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Posted Feb 4, 2012 13:17 UTC (Sat)
by paulj (subscriber, #341)
[Link]
Posted Feb 7, 2012 17:06 UTC (Tue)
by BenHutchings (subscriber, #37955)
[Link] (2 responses)
Yes, Alpha's lack of 8- and 16-bit store instructions was insane. That's why they were added in later versions of the architecture.
Posted Feb 8, 2012 0:11 UTC (Wed)
by daglwn (guest, #65432)
[Link] (1 responses)
And often a machine implements the smaller accesses but there is a performance penalty due to alignment issues and the bus/DRAM access architecture. It's not uncommon for the RMW to be faster.
Posted Feb 8, 2012 13:57 UTC (Wed)
by nix (subscriber, #2304)
[Link]
It's actually easier to come up with a list of architectures that do *not* require natural alignment on loads and stores than to come up with a list of those that do.
Posted Feb 6, 2012 12:41 UTC (Mon)
by chrisV (guest, #43417)
[Link] (7 responses)
In a user-space program all this would be irrelevant. In the case of different threads accessing different 32 bit integers within an aligned 64-bit boundary, POSIX would prohibit gcc making a 64-bit read/write if the program is compiled with the -pthread compiler option and memory corruption would otherwise arise (this is implicit in Base Definitions, General Concepts, section 4.11, Memory Synchronization, of the SUS). In the case of different processes accessing shared memory between processes (about which the C standard says nothing) you are likewise relying on the guarantees of the shared memory implementation, if any.
The kernel is expecting POSIX-like behavior from a C compiler. gcc is not obliged to provide this if that has not been required of it, either by means of the -pthread switch or by gcc being required (once it has been implemented) to use the similar C11 memory model for multi-threaded programs. Those writing kernels have to force the issue in some other way, such as by writing machine-word sized fields in such cases or by writing processor specific assembly language to force the correct synchronized behavior.
Posted Feb 7, 2012 17:12 UTC (Tue)
by BenHutchings (subscriber, #37955)
[Link] (6 responses)
Your unhelpful standards-lawyering attitude is exactly what many programmers have come to hate about certain compiler developers.
Posted Feb 7, 2012 19:21 UTC (Tue)
by chrisV (guest, #43417)
[Link] (3 responses)
If you run a test case compiled with the -pthread flag which exhibits this problem then this is an egregious failure on the part of the threading implementation of gcc and you should post a bug. (It is far more serious than the non-standards complying kernel code.)
Posted Feb 7, 2012 19:36 UTC (Tue)
by chrisV (guest, #43417)
[Link]
Posted Feb 15, 2012 17:01 UTC (Wed)
by cbf123 (guest, #74020)
[Link] (1 responses)
Posted Feb 16, 2012 20:21 UTC (Thu)
by chrisV (guest, #43417)
[Link]
(Yes I have seen the argument in papers in the early proposals for the C/C++ threading model that "memory location" is ambiguous in Base Definitions, section 4.11, of the SUS, and could refer to a machine's natural word size rather than individual scalars, and so C11/C++11 now explicitly states that "memory location" in its equivalent wording means any scalar or bitfield, but that is a completely perverse construction of the SUS as it makes POSIX threads completely useless as a standard, bringing threading back to individual non-standard ABIs.)
Posted Feb 7, 2012 20:18 UTC (Tue)
by chrisV (guest, #43417)
[Link] (1 responses)
This posting is just to comment on your observations on the C11 memory model. You are quite right in saying it represents accumulated wisdom, because its starting point was the (comparatively under-specified) POSIX memory model for multi-threaded programs. However, if you read the lkml postings in questions, you will see that the kernel community do not want the generalized C11 memory model for the kernel. They want the preclusion of false sharing; they don't want full sequential consistency in all cases because of its efficiency implications.
Posted Feb 17, 2012 16:20 UTC (Fri)
by jwakely (subscriber, #60262)
[Link]
Says who?
On ia64-linux it appears to be equivalent to just -D_REENTRANT, on x86-linux it also passes -lpthread to the linker, but it has no effect on code-generation.
> they don't want full sequential consistency in all cases because of its efficiency implications.
Which part of the C11 memory model requires sequential consistency?
Posted Feb 10, 2012 22:22 UTC (Fri)
by dfsmith (guest, #20302)
[Link]
"... any object ... of a volatile-qualified type should not participate in optimizations that would ... modif[y] the object."
To my mind that prohibition includes non-volatile object optimizations that would clobber volatile objects.
The book then slightly contradicts itself two paragraphs later by saying that "optimizations between sequence points are permitted" before clarifying with a hardware example.
* A standard ANSI C reference manual for programmers and compiler implementers at the time.
Betrayed by a bitfield
sounds like we can run this and just ban compilers that miscompile this....
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Knock me over with a feather
Knock me over with a feather
Knock me over with a feather
Knock me over with a feather
Knock me over with a feather
Knock me over with a feather
how many millions of those structures do you need to see some gain in using bitfields?
Knock me over with a feather
Knock me over with a feather
Knock me over with a feather
For instance, a bitfield structure describing the debugging/logging of messages where bits are enable_global, enable_subsystem1, enable_onserial, telnet_connected, and the test is "if (debug.enable_global && (debug.enable_onserial || debug.telnet_connected) && debug.enable_subsystem5)".
If you do not do that, on processors with one single "flag" register, the only way for the compiler is to do multiple test and jump and storing intermediate bit into 32 or 64 bits registers (compilers do not do well sub-register allocation by themself).
You could also use the preprocessor with "#define enable_global 0x80000", but I prefer writing in C/C++ than in CPP...
From Spinlock definition
arch/x86/include/asm/spinlock_types.h
:
typedef u8 __ticket_t;
typedef u16 __ticketpair_t;
typedef struct arch_spinlock {
union {
__ticketpair_t head_tail;
struct __raw_tickets {
__ticket_t head, tail;
} tickets;
};
} arch_spinlock_t;
Shouldn't all the above fields have been declared as volatile ?
Spinlock definition
Betrayed by a bitfield
Betrayed by a bitfield
volatile int a;
int b:1;
};
e->b = 1;
return e->a;
}
entry:
%0 = getelementptr %struct.example* %e, i64 0, i32 1
%1 = bitcast i8* %0 to i32*
%2 = load i32* %1, align 4
%3 = or i32 %2, 1
store i32 %3, i32* %1, align 4
%a = getelementptr inbounds %struct.example* %e, i64 0, i32 0
%4 = load volatile i32* %a, align 4, !tbaa !0
ret i32 %4
}
Betrayed by a bitfield
0: 80 4f 04 01 orb $0x1,0x4(%rdi)
4: 8b 07 mov (%rdi),%eax
6: c3 retq
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
> 4-byte entity is *stupid*, when it's not even faster, and when the base
> type has been specified to be 4 bytes!
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
int b;
int bitfield'
Betrayed by a bitfield
it is violating the part of the spec that says that 'volitile' variables are assumed to be affected by external things and so the compiler cannot make assumptions that the variable has not changed just because it didn't issue any commands to change it.
There is no such part (even if you spell it properly and look for 'volatile' :P ).
An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation-
defined.
This states how volatile accesses are to be carried out. It does not stipulate that such accesses may not have effects on other, non-volatile-qualified objects (such as adjacent bitfield fields): nor does it state that volatile-qualified objects cannot be modified incidentally by expressions that do not refer to them, nor that you cannot read from one and then store the same data back. All that is required is the usual rule that objects not described as having changed contents between sequence points should appear unchanged at the subsequent sequence point from the point of view of the C abstract machine. And that machine, up till C11, is a serial machine, taking no account of threading. There is no provision whatsoever for avoiding a read-write cycle of the same data from an object back into itself in case other threads might modify it, because in a single-threaded environment this is harmless so the freedom is better granted to implementors than denied.
Compiler transformations that introduce assignments to a potentially shared memory location that would not be modified by the abstract machine are generally precluded by this standard, since such an assignment might overwrite another assignment by a different thread in cases in which an abstract machine execution would not have encountered a data race. This includes implementations of data member assignment that overwrite adjacent members in separate memory locations. We also generally preclude reordering of atomic loads in cases in which the atomics in question may alias, since this may violate the "visible sequence" rules.
However, compiler support for C11 isn't here yet, and the kernel is currently being compiled by a C99 compiler. So this is moot except inasmuch as it guides compiler implementors towards what will be expected in the future. And this fix appeared more than twenty years after the finalization of the first C standard to mention 'volatile'. For all that intervening time, the behaviour of GCC seen here was -- while unpleasant and probably a bug -- not a standards-violation.
Betrayed by a bitfield
> adjacent members in separate memory locations.
Betrayed by a bitfield
Betrayed by a bitfield
> of course as a footnote it is not normative).
Betrayed by a bitfield
Betrayed by a bitfield
> by the kernel code in question.
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
> occupies 64 bits of space, where necessary to conform to the standard.
Betrayed by a bitfield
Betrayed by a bitfield
> C++11 mode
Betrayed by a bitfield
> Perfectly working C++ program can already be broken by recompilation in
> C++11 mode
But as we all know, C++ is not C. :) I don't see this as a problem.Betrayed by a bitfield
Betrayed by a bitfield
Not quite. s3.14's definition of 'memory location' is
Betrayed by a bitfield
either an object of scalar type, or a maximal sequence of adjacent bit-fields all having nonzero width
Therefore, even a 'long long' variable occupies a single 'memory location' by that defintion. However, a two-byte array of two chars occupies two memory locations. A location is not the same as an address.
That's very interesting, because it seems to say the provision for "volatile" is so incomplete as to be a pointless language feature. "volatile" was supposed to deal with C programs running in an address space that includes memory mapped I/O regions. But if the compiler is allowed to write whatever it wants whenever it wants into I/O regions, just as long as it doesn't expect it to stay there, what's the point? You can't confidently run a C program in an address space that contains memory mapped I/O regions.
Betrayed by a bitfield
Betrayed by a bitfield
That's very interesting, because it seems to say the provision for "volatile" is so incomplete as to be a pointless language feature.
I dunno. As long as compiler vendors cooperate, it's useful for its intended purpose, which was pretty much entirely *reading* from memory-mapped I/O regions and writing into them very carefully, in a serial environment. It wasn't particularly intended for parallel processing environments or multithreading, AIUI (though since I was only 13 when C89 was finalized, I obviously wasn't there and don't know for sure).
Right, it's a useful feature of de facto C, but maybe not of standard C. This is just an academic discussion about the standard, since it seems everyone agrees GCC needs to change regardless of whether it presently implements the standard.
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
> "volatile" is so incomplete as to be a pointless language feature.
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
int b;
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
int b;
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
> where the struct is marked volatile, there is a compiler bug.
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
That's very interesting, because it seems to say the provision for
"volatile" is so incomplete as to be a pointless language feature.
Not totally true. It's true that volatile doesn't do what most people expect. Basically, volatile says the data can't be cached in a register ...
Betrayed by a bitfield
Betrayed by a bitfield
This is why volatile is non-portable. Unfortunately, C99 has no standard way to force alignment of any object.
Betrayed by a bitfield
Betrayed by a bitfield
I'm simply pointing out that any code that relies on the layout of a struct or how its members are accessed is inherently non-portable.
Not so: you can depend on later members of a struct having strictly higher addresses than earlier ones: i.e., the compiler cannot reshuffle struct members. (This is not an especially exciting guarantee, but it's all we've got.)
Betrayed by a bitfield
Betrayed by a bitfield
Insane? I imagine the 6- and 16-bit processor guys said the same when Alpha appeared.
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
I just tested with gcc version 4.3.2 (Wind River Linux Sourcery G++ 4.3-85) for powerpc. Building the test app (the volatile version) as "gcc x.c -c -m64 -pthread" resulted in the following code, which clearly shows a 64-bit read/write cycle.
Betrayed by a bitfield
0000000000000000 <.wrong>:
0: fb e1 ff f8 std r31,-8(r1)
4: f8 21 ff c1 stdu r1,-64(r1)
8: 7c 3f 0b 78 mr r31,r1
c: f8 7f 00 70 std r3,112(r31)
10: e9 3f 00 70 ld r9,112(r31)
14: e8 09 00 08 ld r0,8(r9)
18: 39 60 00 01 li r11,1
1c: 79 60 f8 2c rldimi r0,r11,31,32
20: f8 09 00 08 std r0,8(r9)
24: e8 21 00 00 ld r1,0(r1)
28: eb e1 ff f8 ld r31,-8(r1)
2c: 4e 80 00 20 blr
Betrayed by a bitfield
Betrayed by a bitfield
Betrayed by a bitfield
A single historical context of volatile