Betrayed by a bitfield

By Jonathan Corbet
February 1, 2012

Developers tend to fear compiler bugs, and for good reason: such bugs can be hard to find and hard to work around. They can leave traps in a compiled program that spring on users at bad times. Things can get even worse if one person's compiler bug is seen by the compiler's developer as a feature - such issues have a tendency to never get fixed. It is possible that just this kind of feature has turned up in GCC, with unknown impact on the kernel.

One of the many structures used by the btrfs filesystem, defined in fs/btrfs/ctree.h, is:

    struct btrfs_block_rsv {
	u64 size;
	u64 reserved;
	struct btrfs_space_info *space_info;
	spinlock_t lock;
	unsigned int full:1;
    };

Jan Kara recently reported that, on the ia64 architecture, the lock field was occasionally becoming corrupted. Some investigation revealed that GCC was doing a surprising thing when the bitfield full is changed: it generates a 64-bit read-modify-write cycle that reads both lock and full, modifies full, then writes both fields back to memory. If lock had been modified by another processor during this operation, that modification will be lost when lock is written back. The chances of good things resulting from this sequence of events are quite small.

One can imagine that quite a bit of work was required to track down this particular surprise. It is also not hard to imagine the dismay that results from a conversation like this:

I've raised the issue with our GCC guys and they said to me that: "C does not provide such guarantee, nor can you reliably lock different structure fields with different locks if they share naturally aligned word-size memory regions. The C++11 memory model would guarantee this, but that's not implemented nor do you build the kernel with a C++11 compiler."

Unsurprisingly, Linus was less than impressed by this response. Language standards are not written for the unique needs of kernels, he said, and can never "guarantee" the behavior that a kernel needs:

So C/gcc has never "promised" anything in that sense, and we've always had to make assumptions about what is reasonable code generation. Most of the time, our assumptions are correct, simply because it would be *stupid* for a C compiler to do anything but what we assume it does.

But sometimes compilers do stupid things. Using 8-byte accesses to a 4-byte entity is *stupid*, when it's not even faster, and when the base type has been specified to be 4 bytes!

As it happens, the problem is a bit worse than non-specified behavior. Linus suggested running a test with a structure like:

    struct example {
	volatile int a;
      	int b:1;
    };

In this case, if an assignment to b causes a write to a, the behavior is clearly buggy: the volatile keyword makes it explicit that a may be accessed from elsewhere. Jiri Kosina gave it a try and reported that GCC is still generating 64-bit operations in this case. So, while the original problem is technically compliant behavior, it almost certainly results from the same decision-making that makes the second example go wrong.

Knowing that may give the kernel community more ammunition to flame the GCC developers with, but it is not necessarily all that helpful. Regardless of the source of the problem, this behavior exists in versions of the compiler that, almost certainly, are being used outside of the development community to build the kernel. So some sort of workaround is likely to be necessary even if GCC's behavior is fixed. That could be a bit of a challenge; auditing the entire kernel for 32-bit-wide bitfield variables in structures that may be accessed concurrently will not be a small job. But, then, nobody said that kernel development was easy.

Index entries for this article
Kernel	GCC

Betrayed by a bitfield

Posted Feb 2, 2012 4:20 UTC (Thu) by arjan (subscriber, #36785) [Link] (3 responses)

So we have a good small testcase...
sounds like we can run this and just ban compilers that miscompile this....

Betrayed by a bitfield

Posted Feb 2, 2012 4:43 UTC (Thu) by nevets (subscriber, #11875) [Link] (2 responses)

I can just imaging the error message we post:

"We refuse to build your kernel with the current version of your compiler. Linus has declared that this version of gcc violates SOPA (Stop Obnoxious Programming Acts). Although the gcc developers feel that this bug is a feature, we have proof that they are wrong."

Betrayed by a bitfield

Posted Feb 2, 2012 9:55 UTC (Thu) by spaetz (guest, #32870) [Link]

Thanks for that comment, it was exactly what I needed to save my day. SOPA, LOL

Betrayed by a bitfield

Posted Feb 9, 2012 17:51 UTC (Thu) by jd (guest, #26381) [Link]

Can't we just give a "Compiler On Acid" error?

Knock me over with a feather

Posted Feb 2, 2012 5:43 UTC (Thu) by ncm (guest, #165) [Link] (8 responses)

I'm astonished that Linus has a problem with this. Everybody knows that C bitfields can never be counted on to work right. Even when one seems to, that's bad luck because it just hasn't failed *yet*. A one-bit bitfield? Next to a lock type defined as 32 bits on an almost militantly 64-bit machine? Why did he allow such cursed, doomed, jinxed code into his kernel in the first place?

My bet is that he knows the code is bad, but is covering his embarrassment by haranguing the Gcc developers, on the (sound!) general principle that they always deserve abuse, regardless of the immediate facts. Gcc developers are the candles whose unsteadiness we blame for causing us to stumble in the darkness.

Knock me over with a feather

Posted Feb 2, 2012 20:05 UTC (Thu) by dashesy (guest, #74652) [Link] (3 responses)

Can someone please explain, what could be the advantage of a "one-bit bitfield" over an integer here?

Knock me over with a feather

Posted Feb 2, 2012 20:46 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Maybe they're planning to have several flags in future?

Knock me over with a feather

Posted Feb 2, 2012 21:57 UTC (Thu) by dlang (guest, #313) [Link] (1 responses)

the thing is, this compiler bug would break that too.

Knock me over with a feather

Posted Feb 3, 2012 6:50 UTC (Fri) by ncm (guest, #165) [Link]

Yes, the real bug was the 32-bit lock word. The bitfield was just to tempt fate.

Knock me over with a feather

Posted Feb 3, 2012 22:58 UTC (Fri) by cavok (subscriber, #33216) [Link] (3 responses)

bitfields in a mostly 32/64 bit aligned world...
how many millions of those structures do you need to see some gain in using bitfields?

Knock me over with a feather

Posted Feb 3, 2012 23:02 UTC (Fri) by dlang (guest, #313) [Link] (1 responses)

not very many.

if you can even replace two int variables with one bitfield, you may shrink your datastructure enough to have it fit within a CPU cache line. The memory access that can then be avoided can be enough to make a noticeable difference.

Knock me over with a feather

Posted Feb 6, 2012 13:32 UTC (Mon) by cavok (subscriber, #33216) [Link]

got it. thank you.

Knock me over with a feather

Posted Feb 6, 2012 12:22 UTC (Mon) by etienne (guest, #25256) [Link]

You can gain a lot of code space using related bits declared in a bitfield structure, because the compiler can do a single mask (to hide irrelevant bits) and a single test of a 32 bits value to decide something.
For instance, a bitfield structure describing the debugging/logging of messages where bits are enable_global, enable_subsystem1, enable_onserial, telnet_connected, and the test is "if (debug.enable_global && (debug.enable_onserial || debug.telnet_connected) && debug.enable_subsystem5)".
If you do not do that, on processors with one single "flag" register, the only way for the compiler is to do multiple test and jump and storing intermediate bit into 32 or 64 bits registers (compilers do not do well sub-register allocation by themself).
You could also use the preprocessor with "#define enable_global 0x80000", but I prefer writing in C/C++ than in CPP...

Spinlock definition

Posted Feb 2, 2012 11:39 UTC (Thu) by abacus (guest, #49001) [Link] (1 responses)

From arch/x86/include/asm/spinlock_types.h:

        typedef u8  __ticket_t;
        typedef u16 __ticketpair_t;

        typedef struct arch_spinlock {
                union {
                        __ticketpair_t head_tail;
                        struct __raw_tickets {
                                __ticket_t head, tail;
                        } tickets;
                };
        } arch_spinlock_t;

Shouldn't all the above fields have been declared as volatile ?

Spinlock definition

Posted Feb 3, 2012 6:54 UTC (Fri) by ncm (guest, #165) [Link]

No, volatile almost never does anything useful. It's the 32-bit lock word that was the mistake. Fortunately, it should be easy to fix in the kernel without depending on a compiler fix, just by forcing a 64-bit aligned word.

Betrayed by a bitfield

Posted Feb 2, 2012 13:57 UTC (Thu) by Lovechild (guest, #3592) [Link] (3 responses)

I wonder if LLVM/Clang behaves differently in this case.

Betrayed by a bitfield

Posted Feb 2, 2012 16:31 UTC (Thu) by darthscsi (guest, #8111) [Link] (2 responses)

It does it correctly. Se the load, or, store (extended to i32 in the padded bytes), then load of the volatile. compiled on x86-64 at O3.

struct example {
volatile int a;
int b:1;
};

int foo(struct example* e) {
e->b = 1;
return e->a;
}

%struct.example = type { i32, i8 }

define i32 @foo(%struct.example* nocapture %e) nounwind uwtable {
entry:
%0 = getelementptr %struct.example* %e, i64 0, i32 1
%1 = bitcast i8* %0 to i32*
%2 = load i32* %1, align 4
%3 = or i32 %2, 1
store i32 %3, i32* %1, align 4
%a = getelementptr inbounds %struct.example* %e, i64 0, i32 0
%4 = load volatile i32* %a, align 4, !tbaa !0
ret i32 %4
}

Betrayed by a bitfield

Posted Feb 2, 2012 16:33 UTC (Thu) by darthscsi (guest, #8111) [Link] (1 responses)

Oh, and the assembly:

0000000000000000 <foo>:
0: 80 4f 04 01 orb $0x1,0x4(%rdi)
4: 8b 07 mov (%rdi),%eax
6: c3 retq

Betrayed by a bitfield

Posted Feb 2, 2012 16:47 UTC (Thu) by khim (subscriber, #9252) [Link]

This looks like x86-64 assembler to me, not IA64.

Are you even sure we are talking about the same case? x86-64 works just fine with GCC, too. PowerPC, Sparc and IA-64 (which started the thread) work incorrectly...

Betrayed by a bitfield

Posted Feb 2, 2012 19:13 UTC (Thu) by jwakely (subscriber, #60262) [Link] (1 responses)

Noone is claiming GCC's behaviour is not a bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080

Betrayed by a bitfield

Posted Feb 2, 2012 22:21 UTC (Thu) by jan.kara (subscriber, #59161) [Link]

Yes. Initially there was some misunderstanding between kernel & GCC ;). But in fact GCC guys agreed that there is a bug. Just they claim fix won't be easy / quick...

Betrayed by a bitfield

Posted Feb 3, 2012 17:43 UTC (Fri) by daglwn (guest, #65432) [Link] (54 responses)

Speaking as a compiler developer here.

> But sometimes compilers do stupid things. Using 8-byte accesses to a
> 4-byte entity is *stupid*, when it's not even faster, and when the base
> type has been specified to be 4 bytes!

The compiler may have no choice. I can imagine a machine where the only access size is 64 bits or greater. Insane? I imagine the 6- and 16-bit processor guys said the same when Alpha appeared. If the hardware does not support a smaller granularity of access there is no choice but to do a RMW. This is why the C standard does not make the guarantees Linus wants it to make.

In this IA64 case it probably is a bug because AFAIK 32-bit accesses are allowed and no worse performance-wise than 64-bit accesses on all implementations. But I can also imagine implementations of IA-64 where a 32-bit access would be more expensive and thus a 64-bit access would make sense.

There really is no way to guarantee what Linus wants using standard C, so the kernel code is buggy. That's why compiler extensions abound. In this case, some of the GCC alignment directives could help.

But as compiler developers, we often have to fix bugs in user code where the developer expected unreasonable things. So we do it. :)

Betrayed by a bitfield

Posted Feb 3, 2012 22:01 UTC (Fri) by daglwn (guest, #65432) [Link] (50 responses)

> In this IA64 case it probably is a bug

To clarify, it's probably only a "bug" in a quality-of-implementation sense. The behavior is perfectly acceptable according to the C standard. 6.7.2.1 makes pretty much everything related to bitfields implementation-defined.

The ABI may define semantics more strictly and if gcc violates them it's a "real" bug. I don't have a copy of the IA-64 ABI so I don't know what the rules are for that platform.

The x86_64 ABI does place more restrictions on bitfield layout but technically there is nothing there that says the bitfield *must* be accessed though any particular type (i.e. load/store size). But quality-of-implementation concerns will lead the compiler writer to implement the semantics Linus expects for that platform and to not do so would be considered a bug.

Betrayed by a bitfield

Posted Feb 3, 2012 22:04 UTC (Fri) by dlang (guest, #313) [Link] (49 responses)

remember that this same thing happens if you have two int (32 bit) fields next to each other, it's not just bitfield related

Betrayed by a bitfield

Posted Feb 3, 2012 22:50 UTC (Fri) by daglwn (guest, #65432) [Link] (48 responses)

I'm not sure which "this" you're referring to but it is certainly the case that one doesn't need bitfields to trigger this kind of false sharing. The behavior entirely depends on the target platform.

Even for the non-bitfield case, it is technically not a bug (unless the ABI mandates something stricter) but is almost certainly a quality-of-implementation issue and for all intents and purposes would be considered a bug if the target platform has the appropriate kinds of loads and stores available.

Betrayed by a bitfield

Posted Feb 3, 2012 22:56 UTC (Fri) by dlang (guest, #313) [Link] (47 responses)

it was pointed out that if you define two 32 bit variables, one of which is defined as volitile, the compiler will still do the 64 bit load, modify, save routine.

This is a direct violation of the spec.

note that these target platforms do have the appropriate loads and stores, and they aren't even slower than the 64 bit ones.

Betrayed by a bitfield

Posted Feb 3, 2012 23:07 UTC (Fri) by daglwn (guest, #65432) [Link] (45 responses)

> This is a direct violation of the spec.

Which spec? The IA-64 ABI? There is nothing in the C standard that mandates a specific allocation size for any member of a struct, much less alignment and access width.

I'm agreeing with you that it's a gcc bug, in that it may violate the ABI (I don't know) and almost certainly violates the standards of a quality implementation.

I'm simply pointing out that any code that relies on the layout of a struct or how its members are accessed is inherently non-portable. Fundamentally, from a language perspective, it is the kernel code that is buggy, assuming the kernel wants to be portable, which I think it does. :)

Betrayed by a bitfield

Posted Feb 4, 2012 0:34 UTC (Sat) by dlang (guest, #313) [Link] (43 responses)

it is violating the part of the spec that says that 'volitile' variables are assumed to be affected by external things and so the compiler cannot make assumptions that the variable has not changed just because it didn't issue any commands to change it.

remember, the developers are not demanding that the layout be a particular way, it's the compiler that's doing this.

restating my understanding

volitile int a;
int b;

and then making a change to b triggers a read/write cycle on a the compiler is doing the wrong thing (because the value of a may change between the read and write) and the programmer has explicitly told the compiler that a may change unexpectedly.

in the case that found the bug they were doing something like

int lock;
int bitfield'

and they found that if one thread modified the value of bitfield while another thread modified lock, the modification to lock would get lost.

If the compiler can't do 32 bit loads and stores, then it needs to store these two variables in a way that they are padded so that the 64 bit lodes and stores won't loose modifications.

Betrayed by a bitfield

Posted Feb 4, 2012 14:20 UTC (Sat) by nix (subscriber, #2304) [Link] (41 responses)

it is violating the part of the spec that says that 'volitile' variables are assumed to be affected by external things and so the compiler cannot make assumptions that the variable has not changed just because it didn't issue any commands to change it.

There is no such part (even if you spell it properly and look for 'volatile' :P ).

The standard states (with wording identical in C99 and the just-released C11):

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously. What constitutes an access to an object that has volatile-qualified type is implementation- defined.

This states how volatile accesses are to be carried out. It does not stipulate that such accesses may not have effects on other, non-volatile-qualified objects (such as adjacent bitfield fields): nor does it state that volatile-qualified objects cannot be modified incidentally by expressions that do not refer to them, nor that you cannot read from one and then store the same data back. All that is required is the usual rule that objects not described as having changed contents between sequence points should appear unchanged at the subsequent sequence point from the point of view of the C abstract machine. And that machine, up till C11, is a serial machine, taking no account of threading. There is no provision whatsoever for avoiding a read-write cycle of the same data from an object back into itself in case other threads might modify it, because in a single-threaded environment this is harmless so the freedom is better granted to implementors than denied.

Thus it is not a violation of the (C99) Standard for accesses to a bitfield to unexpectedly issue write cycles to adjacent volatile bitfields for their current content as far as the serial abstract machine is concerned: indeed, on nearly all platforms this is unavoidable if the adjacent bitfields are located within the same byte. It would not be a violation of the Standard for an implementation to do a massive RMW cycle on all of memory every time a variable of any sort was accessed. These are both quality-of-implementation issues, not standards violations.

In C11, this has been fixed: as footnote 13 of the new section s5.1.2.4 puts it

Compiler transformations that introduce assignments to a potentially shared memory location that would not be modified by the abstract machine are generally precluded by this standard, since such an assignment might overwrite another assignment by a different thread in cases in which an abstract machine execution would not have encountered a data race. This includes implementations of data member assignment that overwrite adjacent members in separate memory locations. We also generally preclude reordering of atomic loads in cases in which the atomics in question may alias, since this may violate the "visible sequence" rules.

However, compiler support for C11 isn't here yet, and the kernel is currently being compiled by a C99 compiler. So this is moot except inasmuch as it guides compiler implementors towards what will be expected in the future. And this fix appeared more than twenty years after the finalization of the first C standard to mention 'volatile'. For all that intervening time, the behaviour of GCC seen here was -- while unpleasant and probably a bug -- not a standards-violation.

Betrayed by a bitfield

Posted Feb 4, 2012 17:56 UTC (Sat) by daglwn (guest, #65432) [Link] (15 responses)

> This includes implementations of data member assignment that overwrite
> adjacent members in separate memory locations.

That would seem to be the "out" that gets around the problem of the target machine not having an appropriately-sized load/store instruction to avoid false sharing.

I've not studied the C11 standard yet. Is that the intent of this statement? Otherwise C11 structs with volatile members could be ABI-incompatible with source-equivalent C99 structs due to forced alignment to avoid placing volatile members next to other members in a single load/store access unit.

In other words, what happens if I have a volatile char next to a non-volatile char and my target only has 32-bit loads and stores?

Betrayed by a bitfield

Posted Feb 4, 2012 21:27 UTC (Sat) by nix (subscriber, #2304) [Link] (14 responses)

Er, that footnote says that those are *precluded*, not permitted (though of course as a footnote it is not normative).

I'm not sure how you can implement these requirements without either ABI changes or an efficiency reduction (on platforms where sub-maximum-sized writes are inefficient). We might have to eat the efficiency loss: we surely cannot fix this by aligning every member of every structure to cacheline boundaries or something like that, that's ridiculous.

(But I am a neophyte in this area. Someone, anyone, who knows more than me, please chip in!)

Betrayed by a bitfield

Posted Feb 6, 2012 18:06 UTC (Mon) by daglwn (guest, #65432) [Link] (13 responses)

> Er, that footnote says that those are *precluded*, not permitted (though
> of course as a footnote it is not normative).

I think it's an "out" because the footnote only covers data in "separate memory locations." Two chars that don't cross an alignment boundary would be in the same "memory location" in some sense. They're in the same atomically-readable space.

But I don't know if that's the intent. If not, I think C11 has a serious issue. It will require ABI changes and that's not going to make people happy.

Betrayed by a bitfield

Posted Feb 6, 2012 18:38 UTC (Mon) by chrisV (guest, #43417) [Link] (11 responses)

Yours cannot be the correct reading. The basic C11 requirement for multi-threaded programs is in §5.1.2.4/4: "Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location." In this normative provision, "memory location" cannot possibly mean anything other than the internal representation of a built-in type (char, int, or other integer, or a pointer), otherwise writing multi-threaded programs would be impossible. The POSIX standard for pthreads imposes a similar requirement on the implementation.

Note that marking a struct or a field of a struct volatile is irrelevant to all this. This (§5.1.2.4) is a part of the C specification relating to access to an object of a given built-in type by multiple threads within a single process. Volatile is concerned only with asynchronous access to a given memory location by a single thread in a single process (for example, by an interrupt). Of course, if volatile were to work at all in the usage in the kernel, some prior cache synchronization would be required so approximating to what is wanted here. But even so, the compiler is not obliged to provide the behavior demanded by the kernel code in question.

Betrayed by a bitfield

Posted Feb 6, 2012 20:02 UTC (Mon) by daglwn (guest, #65432) [Link] (10 responses)

Thanks for explaining things.

> But even so, the compiler is not obliged to provide the behavior demanded
> by the kernel code in question.

Again, not being familiar with C11, what does the memory model give that can help multithreaded programming, then? How does §5.1.2.4/4 work in the case where the machine access granularity is larger than some primitive types and a RMW must happen?

Betrayed by a bitfield

Posted Feb 6, 2012 21:04 UTC (Mon) by chrisV (guest, #43417) [Link] (9 responses)

"Again, not being familiar with C11, what does the memory model give that can help multithreaded programming, then? How does §5.1.2.4/4 work in the case where the machine access granularity is larger than some primitive types and a RMW must happen?"

Discarding volatile, which is irrelevant, it requires the faulty behavior observed in the kernel not to occur. On 64 bit architectures it can do this by various pessimizations. It could use slower 32 bit accesses where available. If that is not available it may have to pad so that any 32-bit scalar occupies 64 bits of space, where necessary to conform to the standard.

None of this is new. gcc's pthreads implementations have been doing this for years. It might be instructive to look at the assembly output of the same test cases compiled with the -pthread switch.

Betrayed by a bitfield

Posted Feb 6, 2012 21:11 UTC (Mon) by dlang (guest, #313) [Link] (2 responses)

32 bit operations are not slower on all 64 bit architectures, including specifically the ia64 architecture where this problem was discovered.

Betrayed by a bitfield

Posted Feb 6, 2012 21:32 UTC (Mon) by chrisV (guest, #43417) [Link] (1 responses)

Well that should make it easier to get the gcc developers to deal with the ia64 at least, although I would take a little persuading that moving memory from 64 bit registers into non-64-bit aligned memory locations didn't cause some slow down in code operating in 64 bit mode. Do you have any citations for that?

Betrayed by a bitfield

Posted Feb 6, 2012 21:39 UTC (Mon) by dlang (guest, #313) [Link]

Linus and several others in the kernel thread on the subject have said this.

Betrayed by a bitfield

Posted Feb 7, 2012 0:43 UTC (Tue) by daglwn (guest, #65432) [Link] (5 responses)

> If that is not available it may have to pad so that any 32-bit scalar
> occupies 64 bits of space, where necessary to conform to the standard.

Or pad 8 bits to 32, etc. This is what I mean by breaking ABI compatibility. If interfaces include struct values this could be a real nightmare.

Personally, I think this requirement is ridiculous absent some special attribute to enforce it. If there was a qualifier on a type to indicate "shared," _a_la_ UPC, that would be one thing. But to have this requirement for every single data item in the program is a very bad idea.

Betrayed by a bitfield

Posted Feb 7, 2012 8:43 UTC (Tue) by khim (subscriber, #9252) [Link] (4 responses)

Since C11 includes _Alignas and you can specify you own alignment it's not a disaster, albeit it is inconvenience.

Perfectly working C++ program can already be broken by recompilation in C++11 mode, so it's not the first time upgrade broke things.

Of course is only possible if original program violated specs and worked by accident (if you can show me genuinely different behavior in standards-compliant program it'll be interesting to know, too, but so far all examples I've seen contained subtle violations of one form or another).

Betrayed by a bitfield

Posted Feb 7, 2012 18:35 UTC (Tue) by daglwn (guest, #65432) [Link] (3 responses)

Ah, cool, didn't know about _Alignas. I'm a codegen guy so I'm not playing around in the C frontend very often. I only look at the standard when I really have to. :)

There's still an ABI problem if the compiler always has to align members to uphold the requirement.

> Perfectly working C++ program can already be broken by recompilation in
> C++11 mode

But as we all know, C++ is not C. :) I don't see this as a problem.

Betrayed by a bitfield

Posted Feb 7, 2012 20:53 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> Perfectly working C++ program can already be broken by recompilation in
> C++11 mode

But as we all know, C++ is not C. :) I don't see this as a problem.

C++ is quite explicitly not C, but C++11 pretends that it's still C++.

Betrayed by a bitfield

Posted Feb 7, 2012 23:29 UTC (Tue) by daglwn (guest, #65432) [Link] (1 responses)

Ah, I read "C11."

Yes, you are correct, but I don't think there's an ABI issue. An ABI issue is much harder to deal with than a semantic change.

With an ABI issue you've got to recompile the world (your project, libraries it links to, etc.) to get a working application. With a semantic change you only recompile the bits that had to recoded to account for the change.

Betrayed by a bitfield

Posted Feb 8, 2012 8:54 UTC (Wed) by khim (subscriber, #9252) [Link]

YMMV, as usual.

We recompile the world anyway, so ABI change is less of a problem, but the fact that just a recompilation does not fix the issue and you need to do a lot of investigations is a problem.

Betrayed by a bitfield

Posted Feb 6, 2012 18:40 UTC (Mon) by nix (subscriber, #2304) [Link]

Not quite. s3.14's definition of 'memory location' is

either an object of scalar type, or a maximal sequence of adjacent bit-ﬁelds all having nonzero width

Therefore, even a 'long long' variable occupies a single 'memory location' by that defintion. However, a two-byte array of two chars occupies two memory locations. A location is not the same as an address.

Betrayed by a bitfield

Posted Feb 4, 2012 21:47 UTC (Sat) by giraffedata (guest, #1954) [Link] (24 responses)

That's very interesting, because it seems to say the provision for "volatile" is so incomplete as to be a pointless language feature. "volatile" was supposed to deal with C programs running in an address space that includes memory mapped I/O regions. But if the compiler is allowed to write whatever it wants whenever it wants into I/O regions, just as long as it doesn't expect it to stay there, what's the point? You can't confidently run a C program in an address space that contains memory mapped I/O regions.

Maybe it's useful in read-only memory mapped I/O regions that just ignore writes.

Betrayed by a bitfield

Posted Feb 4, 2012 22:25 UTC (Sat) by nix (subscriber, #2304) [Link] (2 responses)

That's very interesting, because it seems to say the provision for "volatile" is so incomplete as to be a pointless language feature.

I dunno. As long as compiler vendors cooperate, it's useful for its intended purpose, which was pretty much entirely *reading* from memory-mapped I/O regions and writing into them very carefully, in a serial environment. It wasn't particularly intended for parallel processing environments or multithreading, AIUI (though since I was only 13 when C89 was finalized, I obviously wasn't there and don't know for sure).

In general you can't (and never could) rely on 'volatile' without knowing what the compiler would do when you asked it: implementations have had horrible bugs in their treatment of volatile often enough that care is warranted. But at the very least, it provides a way to flag that 'something tricky is happening here, look out'. It's proved more useful than 'register', low bar though that is. :)

Betrayed by a bitfield

Posted Feb 4, 2012 22:37 UTC (Sat) by giraffedata (guest, #1954) [Link] (1 responses)

Right, it's a useful feature of de facto C, but maybe not of standard C. This is just an academic discussion about the standard, since it seems everyone agrees GCC needs to change regardless of whether it presently implements the standard.

As long as you know that storing to your memory mapped I/O regions is a no-op, you can use standard C with volatile to usefully fetch from them, but otherwise you need more than standard C. You need common sense C. It just seems strange to me because writable I/O regions did exist at the time this accomodation for reading them was added to the spec.

Betrayed by a bitfield

Posted Feb 6, 2012 19:21 UTC (Mon) by chrisV (guest, #43417) [Link]

"This is just an academic discussion about the standard, since it seems everyone agrees GCC needs to change regardless of whether it presently implements the standard."

I absolutely don't agree with this. Forbidding 64-bit read/writes for 32-bit scalars within an aligned 64-bit boundary is a pessimization which should be opted into in C99, such as with the -pthread flag as in current gcc implementations (-pthread does require this pessimization if adjacent memory locations would otherwise be corrupted).

Presumably when C11 is implemented in gcc, opting out of this pessimization will also be available. Or possibly gcc will require some specific flag to be set where a multi-threaded program is being compiled, who knows.

Betrayed by a bitfield

Posted Feb 6, 2012 18:08 UTC (Mon) by daglwn (guest, #65432) [Link] (20 responses)

> That's very interesting, because it seems to say the provision for
> "volatile" is so incomplete as to be a pointless language feature.

Not totally true. It's true that volatile doesn't do what most people expect. Basically, volatile says the data can't be cached in a register (because the unerlying value might change unexpectedly), and little else. But the prohibition on register allocation is a pretty big deal to the compiler.

Betrayed by a bitfield

Posted Feb 6, 2012 19:27 UTC (Mon) by dlang (guest, #313) [Link] (16 responses)

if it's not allowed to be cached in a register (because it could change unexpectedly), why are you allowed to do a read-write cycle on it? isn't that effectively the same thing?

Betrayed by a bitfield

Posted Feb 6, 2012 20:01 UTC (Mon) by chrisV (guest, #43417) [Link] (4 responses)

I am not sure if it answers your question, but gcc would only be non-conforming if it carried out a corrupting read-write cycle on the memory location of a volatile variable as also read by an interrupt in the same thread attempting to read the same variable (or vice-versa), where the volatile variable is of type sig_atomic_t.

This is orthogonal to the original problem, which (as far as I can work out) is concerned with concurrent access by different threads.

Betrayed by a bitfield

Posted Feb 6, 2012 20:06 UTC (Mon) by dlang (guest, #313) [Link] (3 responses)

the thing is, the current GCC behaviour _would_ overwrite the value of a volatile variable that was modified by an interrupt in the middle of the read-modify-write cycle of a different (but adjacent) variable.

As Linux says, it's possible for someone to fix the bug with respect to volatile, and explicitly leave the bug in for all other accesses (with the claim that the spec allows the compiler to do that), but it would probably be more work to do that then to just change the behaviour across the board.

Betrayed by a bitfield

Posted Feb 6, 2012 20:51 UTC (Mon) by chrisV (guest, #43417) [Link] (2 responses)

"the thing is, the current GCC behaviour _would_ overwrite the value of a volatile variable that was modified by an interrupt in the middle of the read-modify-write cycle of a different (but adjacent) variable."

I don't believe that has been tested and I don't believe it to be true. On hardware where 64 bit boundaries are important, sig_atomic_t would be likely to be 64-bits wide. (Someone with a ia64 architecture might be able to confirm this.) In that case alignment boundaries would be fully respected.

Also, I can't see that the original problem had anything to do with interrupts. It seems to be concerned with either multi-threading, or shared access by multiple processes.

Lastly, when you refer to "the bug with respect to volatile", you impute that gcc has the bug. If there is a bug, it is a bug in the kernel.

Betrayed by a bitfield

Posted Feb 6, 2012 21:04 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

one of the posts I saw on this topic (I think from Linus) tested the code snippet

volitile int a;
int b;

b++;

and the resulting assembler did the exact same read-write on a that caused the problem with the case in question. If an interrupt were to happen between the read and the write that changed the value of a, the write would cause that value to be lost.

no, the item in question wasn't related to interrupts, but the cause of the problem causes the same problem if interrupts were involved.

If the 32 bit loads and stores were significantly more expensive to use than 64 bit loads and stores, there would be at least some reason to have this behaviour available, but in the case of the architectures in question there isn't a performance benefit to doing it this way.

Betrayed by a bitfield

Posted Feb 6, 2012 21:16 UTC (Mon) by chrisV (guest, #43417) [Link]

"no, the item in question wasn't related to interrupts, but the cause of the problem causes the same problem if interrupts were involved."

Not it doesn't. Memory access issues relating to multiple threads (or multiple processes in the case of shared memory) have nothing to do with interrupts, and nothing to do with volatile.

This really is beating a dead horse. Compiler switches are available to ensure memory consistency for the case in question, notably the -pthread switch. I can see why the kernel doesn't want to, or can't, use that. In that case the kernel authors need to write some assembler or use 64-bit values on architectures where it is important, or persuade the gcc developers to provide another compiler switch dealing with this particular problem.

Betrayed by a bitfield

Posted Feb 6, 2012 20:05 UTC (Mon) by daglwn (guest, #65432) [Link] (10 responses)

The key work is "cached." On any load/store machine the data must be first loaded into a register to do anything with it. volatile says it must be written back out by the next sequence point. I'm sure language lawyers will find some detail I've missed but that's the way I think about the guarantee.

Betrayed by a bitfield

Posted Feb 6, 2012 20:13 UTC (Mon) by dlang (guest, #313) [Link] (9 responses)

right, but the programmer is not attempting to do anything with it. The programmer is attempting to do something with another variable, one that just happens to be adjacent to the one in question.

again the code snippit is

volitile int a;
int b;

b++;

if modifying b causes a read/write of a, this is wrong.

the programmer has not made any attempt to specify alignment here.

Betrayed by a bitfield

Posted Feb 7, 2012 0:48 UTC (Tue) by daglwn (guest, #65432) [Link] (8 responses)

> if modifying b causes a read/write of a, this is wrong.

No, it's not. Believe me.

The volatile keyword doesn't say anything about when it will change value, be read/written etc. It says simply that it will not be cached in a register such that every read will get the "latest" value expected when executed under the Abstract Machine.

It says nothing about threading.

It says nothing about interrupts.

Simply remember that volatile is not magic. Think of it as the opposite of "register."

Betrayed by a bitfield

Posted Feb 7, 2012 1:03 UTC (Tue) by dlang (guest, #313) [Link]

in that case I have to agree with the other poster who said that if the compiler considers it Ok to write over any arbitrary memory locations, as long as what it's writing matches what the compiler thinks is already there, then that compiler is unsuitable for use with any memory mapped I/O as it will feel free to clobber the new data that is waiting to be read.

since this sort of thing has been part of C's traditional strength, this doesn't seem like a sane interpretation to me.

Betrayed by a bitfield

Posted Feb 7, 2012 16:54 UTC (Tue) by chrisV (guest, #43417) [Link] (6 responses)

"It says nothing about interrupts."

I think it does. See §5.1.2.3/5 and /10, which are normative. The principal purpose of volatile is to deal with arbitrary changes of data values outside the program context of the process in which the code is running (ie asynchronous interrupts). (This does not include threads, which are within the program context and which, because they can run on more than one core, require quite different synchronizations, some of which are not async-signal-safe.)

See also the footnote 134 of §6.7.3/8 (which is non-normative despite the "shall not"): "A volatile declaration may be used to describe an object corresponding to a memory-mapped input/output port or an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be 'optimized out' by an implementation or reordered except as permitted by the rules for evaluating expressions." This is a curious note as, as far as I am aware, it is the one and only reference to memory mapping (and about which I mis-spoke in an earlier posting on this article because it is not in C99 which contains no reference to memory mapping).

Betrayed by a bitfield

Posted Feb 7, 2012 18:38 UTC (Tue) by daglwn (guest, #65432) [Link] (5 responses)

> "It says nothing about interrupts."

Thanks for the correction. But the compiler is still correct here. Volatile doesn't say anything about restricting _when_ it is read or written, only that it will get the "latest" value in a single-thread context.

It's perfectly fine for I/O as long as you can guarantee alignment such that there is no "false sharing."

Betrayed by a bitfield

Posted Feb 7, 2012 19:07 UTC (Tue) by chrisV (guest, #43417) [Link] (4 responses)

I agree. And if you get false sharing between two free-standing variables where the one being operated on is marked volatile, or between fields of a struct where the struct is marked volatile, there is a compiler bug. It is still not clear whether that is the case with the kernel test case (first, the struct was not marked volatile, only one of its fields; and secondly, we don't know whether the test case involved an asynchronous test (as opposed to threads) or not.

Betrayed by a bitfield

Posted Feb 7, 2012 20:01 UTC (Tue) by dlang (guest, #313) [Link]

the kernel did not have one field marked volatile, but in the research into the problem, someone (I think it was Linus) tested with volatile and the false sharing was happening there as well.

Betrayed by a bitfield

Posted Feb 7, 2012 23:32 UTC (Tue) by daglwn (guest, #65432) [Link] (1 responses)

> And if you get false sharing between two free-standing variables where the > one being operated on is marked volatile, or between fields of a struct
> where the struct is marked volatile, there is a compiler bug.

No, there isn't.

There isn't. Really.

Volatile does not mean what you think it means.

It's a bit like sequential consistency. Just when you think you understand it, something unexpected happens that is both non-intuitive and perfectly legal.

Betrayed by a bitfield

Posted Feb 8, 2012 15:24 UTC (Wed) by daglwn (guest, #65432) [Link]

Seeing some of your other posts about -pthread, I think we are in agreement. Apologies if I mischaracterized your understanding.

Betrayed by a bitfield

Posted Feb 8, 2012 13:51 UTC (Wed) by nix (subscriber, #2304) [Link]

And, thirdly, the kernel is not C11 code -- yet.

Betrayed by a bitfield

Posted Feb 6, 2012 22:22 UTC (Mon) by giraffedata (guest, #1954) [Link] (2 responses)

That's very interesting, because it seems to say the provision for "volatile" is so incomplete as to be a pointless language feature.
Not totally true. It's true that volatile doesn't do what most people expect. Basically, volatile says the data can't be cached in a register ...

I was commenting at a higher level. Never mind what the compiler can and can't do. The question is, what programs can you write in C because it has the "volatile" feature that you couldn't otherwise? Relying on nothing but that the compiler implements the C standard.

I've always understood the originally intended answer to be, "you can write a program that uses memory mapped I/O." But comments in this thread say regardless of how one uses the "volatile" keyword in a program, the compiler may always generate code that arbitrarily writes to memory mapped I/O addresses. There's nothing in the standard to stop it. If so, then you not only can't use memory mapped I/O in a C program, you can't even let a C program — any C program — run in an address space that includes memory mapped I/O regions.

Unless all memory-mapped I/O regions ignore writes.

And it's hard to believe that the definers of "volatile" actually had such a useless thing in mind.

Betrayed by a bitfield

Posted Feb 7, 2012 0:52 UTC (Tue) by daglwn (guest, #65432) [Link] (1 responses)

Again, this is implementation-defined behavior. The standard you're looking for is the ABI for your target. That's where all the memory layout, access size, calling convention and other low-level stuff is specified.

Volatile is perfectly fine for I/O as long as you know the address being accessed is suitably aligned to avoid problems, as the ABI should indicate.

This is why volatile is non-portable. Unfortunately, C99 has no standard way to force alignment of any object.

Betrayed by a bitfield

Posted Feb 7, 2012 8:46 UTC (Tue) by khim (subscriber, #9252) [Link]

This is why volatile is non-portable. Unfortunately, C99 has no standard way to force alignment of any object.

GCC, MSCV and other compilers include such an ability and C11 finally adds it to standard so it's all is not so bad...

Betrayed by a bitfield

Posted Feb 4, 2012 17:47 UTC (Sat) by daglwn (guest, #65432) [Link]

That's not what volatile says. There is nothing in the entire C execution model that even considers multiple threads. That's why we have other standards like MPI and OpenMP.

If you need to protect data like this you must make use of constructs outside the C standard.

Betrayed by a bitfield

Posted Feb 4, 2012 13:57 UTC (Sat) by nix (subscriber, #2304) [Link]

I'm simply pointing out that any code that relies on the layout of a struct or how its members are accessed is inherently non-portable.

Not so: you can depend on later members of a struct having strictly higher addresses than earlier ones: i.e., the compiler cannot reshuffle struct members. (This is not an especially exciting guarantee, but it's all we've got.)

Betrayed by a bitfield

Posted Feb 4, 2012 13:17 UTC (Sat) by paulj (subscriber, #341) [Link]

There's a comment on the GCC bug that suggests that for volatile to have this affect on a structure member, it would require the whole structure to be tagged volatile - not just the member concerned. (Not offering my own opinion here either way, EINSUFFICIENTEXPERTISE, just drawing attention to the comment).

Betrayed by a bitfield

Posted Feb 7, 2012 17:06 UTC (Tue) by BenHutchings (subscriber, #37955) [Link] (2 responses)

Insane? I imagine the 6- and 16-bit processor guys said the same when Alpha appeared.

Yes, Alpha's lack of 8- and 16-bit store instructions was insane. That's why they were added in later versions of the architecture.

Betrayed by a bitfield

Posted Feb 8, 2012 0:11 UTC (Wed) by daglwn (guest, #65432) [Link] (1 responses)

There are plenty of architectures that have these kinds of restricted loads and stores. The old Cray machines only ever operated on 64 bit words, for example.

And often a machine implements the smaller accesses but there is a performance penalty due to alignment issues and the bus/DRAM access architecture. It's not uncommon for the RMW to be faster.

Betrayed by a bitfield

Posted Feb 8, 2012 13:57 UTC (Wed) by nix (subscriber, #2304) [Link]

SPARC is another major example of an arch with alignment-restricted loads and stores. I have dim memories that suggest that MIPS might be as well.

It's actually easier to come up with a list of architectures that do *not* require natural alignment on loads and stores than to come up with a list of those that do.

Betrayed by a bitfield

Posted Feb 6, 2012 12:41 UTC (Mon) by chrisV (guest, #43417) [Link] (7 responses)

This is not a gcc bug even in the case of struct fields marked volatile, if the whole struct is not marked volatile.

In a user-space program all this would be irrelevant. In the case of different threads accessing different 32 bit integers within an aligned 64-bit boundary, POSIX would prohibit gcc making a 64-bit read/write if the program is compiled with the -pthread compiler option and memory corruption would otherwise arise (this is implicit in Base Definitions, General Concepts, section 4.11, Memory Synchronization, of the SUS). In the case of different processes accessing shared memory between processes (about which the C standard says nothing) you are likewise relying on the guarantees of the shared memory implementation, if any.

The kernel is expecting POSIX-like behavior from a C compiler. gcc is not obliged to provide this if that has not been required of it, either by means of the -pthread switch or by gcc being required (once it has been implemented) to use the similar C11 memory model for multi-threaded programs. Those writing kernels have to force the issue in some other way, such as by writing machine-word sized fields in such cases or by writing processor specific assembly language to force the correct synchronized behavior.

Betrayed by a bitfield

Posted Feb 7, 2012 17:12 UTC (Tue) by BenHutchings (subscriber, #37955) [Link] (6 responses)

None of the options you are suggesting makes a slight difference to gcc behaviour in this case. In any case, the C11 memory model is to a large extent a codification of what is already understood in the industry to be necessary to support multithreaded C programs.

Your unhelpful standards-lawyering attitude is exactly what many programmers have come to hate about certain compiler developers.

Betrayed by a bitfield

Posted Feb 7, 2012 19:21 UTC (Tue) by chrisV (guest, #43417) [Link] (3 responses)

Are you telling me that you have actually run a test case compiled with the -pthread option which exhibits this problem? If not, I just don't believe you because I have written shed loads of multi-threaded code using POSIX threads which has never encountered a false sharing problem of this kind.

If you run a test case compiled with the -pthread flag which exhibits this problem then this is an egregious failure on the part of the threading implementation of gcc and you should post a bug. (It is far more serious than the non-standards complying kernel code.)

Betrayed by a bitfield

Posted Feb 7, 2012 19:36 UTC (Tue) by chrisV (guest, #43417) [Link]

And since you refer to "none of the options you are suggesting makes a slight difference", can you also clarify whether you mean you have run a test case demonstrating false sharing even in the case of 64-bit scalars, and/or even in the case of assembly instructions which should have forced synchronization?

Betrayed by a bitfield

Posted Feb 15, 2012 17:01 UTC (Wed) by cbf123 (guest, #74020) [Link] (1 responses)

I just tested with gcc version 4.3.2 (Wind River Linux Sourcery G++ 4.3-85) for powerpc. Building the test app (the volatile version) as "gcc x.c -c -m64 -pthread" resulted in the following code, which clearly shows a 64-bit read/write cycle.

0000000000000000 <.wrong>:
   0:   fb e1 ff f8     std     r31,-8(r1)
   4:   f8 21 ff c1     stdu    r1,-64(r1)
   8:   7c 3f 0b 78     mr      r31,r1
   c:   f8 7f 00 70     std     r3,112(r31)
  10:   e9 3f 00 70     ld      r9,112(r31)
  14:   e8 09 00 08     ld      r0,8(r9)
  18:   39 60 00 01     li      r11,1
  1c:   79 60 f8 2c     rldimi  r0,r11,31,32
  20:   f8 09 00 08     std     r0,8(r9)
  24:   e8 21 00 00     ld      r1,0(r1)
  28:   eb e1 ff f8     ld      r31,-8(r1)
  2c:   4e 80 00 20     blr

Betrayed by a bitfield

Posted Feb 16, 2012 20:21 UTC (Thu) by chrisV (guest, #43417) [Link]

Any chance you could file a bug on this one? It is not so bad if the non-standard use of volatile has this effect (although no doubt still very annoying to the kernel developers), but it seems to me to be something else when the code happens to be POSIX-conforming code and corrupts memory locations.

(Yes I have seen the argument in papers in the early proposals for the C/C++ threading model that "memory location" is ambiguous in Base Definitions, section 4.11, of the SUS, and could refer to a machine's natural word size rather than individual scalars, and so C11/C++11 now explicitly states that "memory location" in its equivalent wording means any scalar or bitfield, but that is a completely perverse construction of the SUS as it makes POSIX threads completely useless as a standard, bringing threading back to individual non-standard ABIs.)

Betrayed by a bitfield

Posted Feb 7, 2012 20:18 UTC (Tue) by chrisV (guest, #43417) [Link] (1 responses)

I want to keep your issues separate and I have asked you under a separate posting about your tests concerned. On the POSIX point, setting out the assembly output of test cases for Itanium, first without -pthread and then with -pthread, and showing the false sharing would be sufficient, as -pthread is supposed to switch on the POSIX memory model in gcc and it will quickly become apparent if it doesn't.

This posting is just to comment on your observations on the C11 memory model. You are quite right in saying it represents accumulated wisdom, because its starting point was the (comparatively under-specified) POSIX memory model for multi-threaded programs. However, if you read the lkml postings in questions, you will see that the kernel community do not want the generalized C11 memory model for the kernel. They want the preclusion of false sharing; they don't want full sequential consistency in all cases because of its efficiency implications.

Betrayed by a bitfield

Posted Feb 17, 2012 16:20 UTC (Fri) by jwakely (subscriber, #60262) [Link]

> as -pthread is supposed to switch on the POSIX memory model in gcc

Says who?

On ia64-linux it appears to be equivalent to just -D_REENTRANT, on x86-linux it also passes -lpthread to the linker, but it has no effect on code-generation.

> they don't want full sequential consistency in all cases because of its efficiency implications.

Which part of the C11 memory model requires sequential consistency?

A single historical context of volatile

Posted Feb 10, 2012 22:22 UTC (Fri) by dfsmith (guest, #20302) [Link]

To quote from Harbison & Steele 3rd ed. (1991)* section 4.4.5

"... any object ... of a volatile-qualified type should not participate in optimizations that would ... modif[y] the object."

To my mind that prohibition includes non-volatile object optimizations that would clobber volatile objects.

The book then slightly contradicts itself two paragraphs later by saying that "optimizations between sequence points are permitted" before clarifying with a hardware example.

* A standard ANSI C reference manual for programmers and compiler implementers at the time.