It's different viewpoints...

Posted May 27, 2011 14:26 UTC (Fri) by mpr22 (subscriber, #60784)
In reply to: It's different viewpoints... by anton
Parent article: What Every C Programmer Should Know About Undefined Behavior #3/3

Most optimizing compilers have a "do not optimize" mode. In gcc, "do not optimize" (-O0) is the default setting; you have to explicitly enable the footgun. -O0 still violates the programmer's expectations in C99 and C++98, though, since the "inline" keyword is ineffective, even on the tiniest of functions, when gcc's optimizer is disabled.

It's different viewpoints...

Posted May 29, 2011 8:46 UTC (Sun) by anton (subscriber, #25547) [Link] (15 responses)

Yes, I guess the way that gcc etc. are going, we should recommend -O0 as the option to use if people want their programs to behave as intended.

-O0 certainly violates my performance expectations, because I don't expect all local variables to end up in main memory (I expect that the compiler puts many of them in registers); but that (and inline) is just performance, the compiled program still does what is intended.

Chris Lattner recommends more specific flags for disabling some of the misfeatures of clang, but these are not complete (and even if they are now, tomorrow another version of Clang might introduce another misfeature that is not covered by these flags) and they don't work on all versions of all compilers, so -O0 is probably the best way that we have now to get intended behaviour for our programs. Of course, given the mindset of the gcc developers (and obviously also the clang developers), there is no guarantee that -O0 will continue to produce the intended behaviour in the future.

I wonder why they put so much effort in "optimization" if the recommendation is to disable some or all of these "optimizations" in order to get the program working as intended. In my experience gcc-2.x did not have this problem. There I could compile my programs with -O (or even -O2) and the programs still worked as intended without any further ado (and they performed much better than gcc-4.x -O0). Too bad there is no gcc-2.x for AMD64 or I would just forget about gcc-4.x.

It's different viewpoints...

Posted May 29, 2011 9:51 UTC (Sun) by nix (subscriber, #2304) [Link] (10 responses)

I wonder why they put so much effort in "optimization" if the recommendation is to disable some or all of these "optimizations" in order to get the program working as intended.

Because one of the ways people intend their programs to work is 'fast', and because not all code is the sort of riceboy rocket science that gets broken by these optimizations? I've personally written code that fell foul of aliasing optimizations precisely twice, and every time I knew I was doing something dirty when I did it.

Come on! Who writes *(foo *)&thing_of_type_bar and doesn't think 'that is ugly and risky, there must be a better way'?

It's different viewpoints...

Posted May 30, 2011 17:46 UTC (Mon) by anton (subscriber, #25547) [Link] (9 responses)

Who writes *(foo *)&thing_of_type_bar and doesn't think 'that is ugly and risky, there must be a better way'?

I write such code. It's ugly, true. It did not use to be risky until some compiler writers made it so; on the contrary, it worked as I expect it to on all targets suppored by gcc-2.x (and the example of mpr22 about code that's not 64-bit clean is a red herring; that's a portability bug that does not work with -O0 or gcc-2.x -O, either, so it has nothing to do with the present discussion).

But code like this one of the reasons for using C rather than, say, Java. C is supposed to be a low-level language, or a portable assembler; even Chris Lattner claims that "the designers of C wanted it to be an extremely efficient low-level programming language". This is one of the things I want to do when I choose a low-level language.

One of my students implemented Postscript in C#; it was interesting to see what was not possible (or practical) in that language and how inefficient the workarounds were. If we were to write in the common subset of C and C#/Java, as the defenders of misbehaviour in gcc and clang suggest, we would be similarly inefficient, and the "optimizations" that these language restrictions enable won't be able to make up for that inefficiency by far.

Maybe your programs are not affected in this way, unlike some of my programs, but then you don't need a low-level language and could just as well use a higher-level language.

Sorry, but this is just wrong...

Posted May 30, 2011 18:47 UTC (Mon) by khim (subscriber, #9252) [Link] (8 responses)

It did not use to be risky until some compiler writers made it so; on the contrary, it worked as I expect it to on all targets suppored by gcc-2.x

Sorry, but this is just not true. Lots of platforms it was flaky because FPU was physically separate. Most of them were embedded but it was the problem for 80386CPU+80287FPU (yes, it's legal and yes, such plaforms were actually produced), for example.Sure, some platforms were perfectly happy with such code. But then if you want low-level non-portable language... asm is always there.

Maybe your programs are not affected in this way, unlike some of my programs, but then you don't need a low-level language and could just as well use a higher-level language.

Or, alternatively, you can actually read specifications and see what the language actually supports. Most (but not all) "crazy behaviors" of gcc and clang just faithfully emulate hardware portability problems, nothing more, nothing less. It's kind of funny, but real low-level stuff (like OS kernels or portable on-bare-metal programs) usually survive "evil compilers" just fine. It's code from "programmer cowboys" who know how the 8086 works and ignore everything else which is problematic.

C as portable assembler

Posted May 30, 2011 18:56 UTC (Mon) by jrn (subscriber, #64214) [Link] (5 responses)

> Sure, some platforms were perfectly happy with such code. But then if you want low-level non-portable language... asm is always there.

And so is C. :) After all, what language is the Linux kernel written in?

> Or, alternatively, you can actually read specifications and see what the language actually supports.

I don't think the case of signed overflow is one of trial and error versus reading specifications. It seems more like one of folk knowledge versus new optimizations --- old gcc on x86 and many similar platforms would use instructions that wrap around for signed overflow, so when compiling old code that targeted such platforms, it seems wise to use -fwrapv, and when writing new code it seems wise to add assertions to document why you do not expect overflow to occur.

Of course, reading the spec can be a pleasant experience independently from that.

C as portable assembler

Posted May 31, 2011 7:17 UTC (Tue) by khim (subscriber, #9252) [Link] (2 responses)

> Sure, some platforms were perfectly happy with such code. But then if you want low-level non-portable language... asm is always there.

And so is C. :) After all, what language is the Linux kernel written in?

Linux kernel is written in C, quite portable and people fight constantly to fix hardware and software compatibility problems. Note that while GCC improvements are source of a few errors they are dwarfed by number of hardware compatibility errors. Most of the compiler problems happen when people forget to use appropriate constructs defined to make hardware happy: by some reason macroconstructs designed to fight hardware make code sidestep a wide range of undefined C behaviors. Think about it.

I don't think the case of signed overflow is one of trial and error versus reading specifications.

It is, as was explained before. There are other similar cases. For example standard gives you ability to convert pointer to int in some cases, but even then you can not convert int to pointer because on some platforms pointer is not just a number - yet people who don't know better often do that. Will you object if gcc and/or clang will start to miscompile such programs tomorrow?

It seems more like one of folk knowledge versus new optimizations --- old gcc on x86 and many similar platforms would use instructions that wrap around for signed overflow, so when compiling old code that targeted such platforms, it seems wise to use -fwrapv, and when writing new code it seems wise to add assertions to document why you do not expect overflow to occur.

Note that all these new optimizations are perfectly valid for the portable code. Surprisingly enough -fwrapv exist not to make broken programs valid but to make sure Java overflow semantic is implementable in GCC. Sure, you can use it in C, but it does not mean your code is suddenly correct after that.

Of course, reading the spec can be a pleasant experience independently from that.

Actually it's kind of sad that the only guide we have here is the standard... Given how often undefined behavior bites us you'd think we'll have books which explain where and how they can be triggered in "normal" code. Why people accept that i = i++ + ++i; is unsafe and unpredictable code but lots of other cases which trigger undefined behavior are perceived as safe? It's matter of education...

C as portable assembler

Posted May 31, 2011 17:56 UTC (Tue) by anton (subscriber, #25547) [Link] (1 responses)

Why people accept that i = i++ + ++i; is unsafe and unpredictable code

Who would write "i = i++ + ++i;" anyway?
It is easy to write what you intended here (whatever that was) in a way that's similarly short and fast and generates a similar amount of code.

but lots of other cases which trigger undefined behavior are perceived as safe?

Because they were safe, until the gcc maintainers decided to break them (and the LLVM maintainers follow them like lemmings).

That's the point...

Posted Jun 1, 2011 9:16 UTC (Wed) by khim (subscriber, #9252) [Link]

It is easy to write what you intended here (whatever that was) in a way that's similarly short and fast and generates a similar amount of code.

It's easy to do in other cases, too. You can always use memcpy to copy from float to int. GCC will eliminate memcpy and unneeded variables.

$ cat test.c
#include <string.h>

int convert_float_to_int(float f) {
  int i;
  memcpy(&i, &f, sizeof(float));
  return i;
}
$ gcc -O2 -S test.c
$ cat test.s
        .file "test.c"
        .text
        .p2align 4,,15
.globl convert_float_to_int
        .type convert_float_to_int, @function
convert_float_to_int:
.LFB22:
        .cfi_startproc
        movss %xmm0, -4(%rsp)
        movl -4(%rsp), %eax
        ret
        .cfi_endproc
.LFE22:
        .size convert_float_to_int, .-convert_float_to_int
        .ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
        .section
    .note.GNU-stack,"",@progbits

Because they were safe, until the gcc maintainers decided to break them (and the LLVM maintainers follow them like lemmings).

They were never completely safe albeit cases where they break were rare. Today it happens more often. This is not the end of the world, but this is what you must know and accept.

C as portable assembler

Posted May 31, 2011 18:05 UTC (Tue) by anton (subscriber, #25547) [Link] (1 responses)

The funny thing is that new gcc (at least up to 4.4) still generates code for signed addition that wraps around instead of code that traps on overflow. It's as if the aim of the gcc maintainers was to be least helpful to everyone: for the low-level coders, miscompile their code silently; for the specification pedants, avoid giving them ways to detect when they have violated the spec.

C as portable assembler

Posted May 31, 2011 21:00 UTC (Tue) by jrn (subscriber, #64214) [Link]

There is -fwrapv and -ftrapv. If you think one of those should be the default, no doubt there are some other optimization tweaks (-fno-strict-aliasing?) that you also like; so I encourage you to work on an -Oanton switch in either a wrapper for gcc or gcc itself, for the sake of sharing.

I am pretty happy with -O2 for my own needs, since it generates fast code for loops, but I understand that different situations may involve different requirements and would be happy to live in a world with more people helping make the gcc UI more intuitive.

Sorry, but this is just wrong...

Posted May 31, 2011 17:39 UTC (Tue) by anton (subscriber, #25547) [Link] (1 responses)

Lots of platforms it was flaky because FPU was physically separate.

My code ran fine on systems with physically separate FPU (e.g., MIPS R2000+R2010). Also, why should the separate FPU affect the types foo and bar?

Anyway, if there is a hardware issue that we have to deal with, that's fine with me, and I will deal with it. But a compiler that miscompiles on hardware that's perfectly capable of doing what I intend (as evidenced by the fact that gcc-2.x -O and gcc-4.x -O0 achieve what I intend) is a totally different issue.

But then if you want low-level non-portable language... asm is always there.

I want a relatively portable low-level language, and gcc-2.x -O and (for now) gcc-4.x -O0 provide that, and my code ports nicely to all the hardware I can get my hands on (and to some more that I cannot); that's definitely not the case for asm, and it's not quite the case with gcc-4.x -O. For now we work around the breakage of gcc-4.x, but the resulting code is not as fast and small as it could be; and we did not have to endure such pain with gcc-2.x.

And new gcc releases are the biggest source of problems for my code. New hardware is much easier.

Or, alternatively, you can actually read specifications and see what the language actually supports.

The C standard specification is very weak, and has more holes than content (was it 190 undefined behaviours? Plus implementation-defined behaviours). Supposedly the holes are there to support some exotic platforms (such as ones-complement machines where signed addition traps on overflow). Sure, people who want to port to such platforms will have to avoid some low-level-coding practices, and will have to suffer the pain that you want to inflict on all of us.

But most of us and our code will never encounter such platforms (these are deservedly niche platforms, and many of these niches become smaller and smaller over time) and we can make our code faster and smaller with these practices, at least if we have a language that supports them. The language implemented by gcc-2.x does support these practices. We just need a compiler for this language that targets modern hardware.

So, the ANSI C specification and the language it specifies is pretty useless, because of these holes. Even Chris Lattner admits that "There is No Reliable Way to Determine if a Large Codebase Contains Undefined Behavior". So what you suggest is just impractical. Not just would it mean giving up on low-level code, the compiler could still decide to format the hard disk if it likes to, because most likely the code still contains some undefined behaviour.

It's kind of funny, but real low-level stuff (like OS kernels or portable on-bare-metal programs) usually survive "evil compilers" just fine.

I have seen enough complaints from kernel developers about breakage from new gcc versions, and that despite that fact that Linux is probably one of the few programs (apart from SPEC CPU) that the gcc maintainers care for. The kernel developers do what we do, they try to work around the gcc breakage, but I doubt that that's the situation they wish for.

Well, if you want some different language you can create it...

Posted Jun 1, 2011 9:14 UTC (Wed) by khim (subscriber, #9252) [Link]

Also, why should the separate FPU affect the types foo and bar?

If FPU module is weakly tied to CPU module then you must explicitly synchronize them. Functions like mamcpy did that so they were safe, regular operations didn't. That's why standard only supports one way to do what you want and it's memcpy. Which is eliminated completely by modern compilers like or clang where it's not needed.

My code ran fine on systems with physically separate FPU (e.g., MIPS R2000+R2010).

Well, sure. Not all combinations were flaky. It does not change the fact that the only way to convert float to int safely was, is and will be memcpy (till the next redaction of C standard, at least).

But a compiler that miscompiles on hardware that's perfectly capable of doing what I intend (as evidenced by the fact that gcc-2.x -O and gcc-4.x -O0 achieve what I intend) is a totally different issue.

Why? Compiler just makes your hardware less predictable - but only till the boundaries outlined in the standard. That's what the compiler is supposed to do! These boundaries were chosen to support wide range of architectures and optimizations, but if you want different boundaries - feel free to create new language.

For now we work around the breakage of gcc-4.x, but the resulting code is not as fast and small as it could be; and we did not have to endure such pain with gcc-2.x.

This is possible but it just shows you need some different capabilities from your language. You can propose some extensions and/or optimizations to gcc and clang developers to make your code faster. Complains that your code does not work when it clearly violates the spec will lead you nowhere.

GCC and CLANG developers are not stuck up snobs, they are ready to add new extensions when it's needed (this changes the language spec and makes some previously undefined constructs defined), but they clearly are reluctant to try to guess what your code is doing without your help - if they have no clear guidance they use the spec.

The language implemented by gcc-2.x does support these practices.

Ok, if you like it, then use it.

We just need a compiler for this language that targets modern hardware.

No problem, it's there. It may look like a stupid excuse, but it's not. The only way to support something which exist only as implementation on newer platform is emulation. That's why there are all these PDP-11 emulators, NES emulators, etc.

If you want something different then first you must document your assumptions which don't adhere to the C99 spec and then talk with compiler developers.

So, the ANSI C specification and the language it specifies is pretty useless, because of these holes.

Huh? Where THIS comes from? Compiler does not know what the program actually does, but surely the programmer which wrote it does! S/he can avoid undefined behaviors - even if it's not always simple and/or easy. If the programmer likes to play Russian Roulette with the language - it's his/her choice, but then he should expect to be blown to bits from time to time.

Not just would it mean giving up on low-level code, the compiler could still decide to format the hard disk if it likes to, because most likely the code still contains some undefined behaviour.

Compiler can only do that if you trigger undefined behavior. Most undefined behaviors are pretty easy to spot and avoid but some of them are not. Instead of whining here and asking for the pie in the sky which will never materialize you can offer changes to the specifications which will simplify life of the programmer. Then they may be adopted either as extensions or as new version of C. It's supposed to be revised every 10 years, you know.

The kernel developers do what we do, they try to work around the gcc breakage, but I doubt that that's the situation they wish for.

Sure, but they use new GCC capabilities when they become available too and they accepted that they come as package deal. Note that kernel no longer support GCC 2.95 at all.

There was one time when kernel developers said what you are saying and refused to support newer versions of GCC but this lead them nowehere. Today linux kernel can be compiler with GCC 4.6 just fine.

It's different viewpoints...

Posted May 29, 2011 10:24 UTC (Sun) by mpr22 (subscriber, #60784) [Link]

My programs behave as intended when compiled with the GNU Compiler Collection version 4.6 at -O2, and when they don't, it's because I didn't correctly express my intention in the first place.

It may help that over the years, I've had to deal with sizeof(int) doing what the standard says it might (i.e. not be the same on all platforms), and I've been caught out by sizeof(long) doing the same (I wrote a game on i686 round about the time amd64 was launched; the RNG blew up when someone tried to use it on amd64, because I'd written "long" in a place where what I actually meant was "32 bits").

So in the case of the example above (where a bounded loop became infinite because of the way the compiler treated a body of code whose behaviour is formally undefined under the C standard), I'm not actually all that sympathetic. <stdint.h> exists; it behooves the responsible C programmer to use it.

It's different viewpoints...

Posted May 29, 2011 19:27 UTC (Sun) by iabervon (subscriber, #722) [Link] (2 responses)

In the 2.xx days, at least, -O2 meant "use all optimizations that don't change the behavior of any programs, including invalid ones, except totally crazy stuff" (e.g., if you start poking around at your stack frames or use non-volatile pointers to do MMIO, all bets are off). -O3 and higher would give you optimizations that wouldn't work with programs that do things that are technically not permitted. It would be nice if they had an optimization level that would be suitable for what most people think C is, as well as one that tells the compiler that the programmer has carefully avoided any undefined behavior.

It's different viewpoints...

Posted May 30, 2011 1:01 UTC (Mon) by vonbrand (guest, #4458) [Link] (1 responses)

AFAICS, nothing whatsoever has changed then, undefined behaviour is "totally crazy stuff, that nobody in their right mind would expevt to work as intended everywhere"...

It's different viewpoints...

Posted May 30, 2011 1:56 UTC (Mon) by iabervon (subscriber, #722) [Link]

In the 2.xx days, the "totally crazy" stuff was what actual C programmers, who only had 3rd-hand knowledge of the spec, knew couldn't be defined. Pretty much all of the available processors used 2's complement, and everyone assumed that signed overflow used 2's complement and certainly produced some value or other. You couldn't be sure what you'd get from an uninitialized variable, but it would produce some value (and would continue to have that value until you wrote to it). On the other hand, people had no idea what the function call ABI was, or how the stack frame would be laid out, so they couldn't guess what would happen with undefined behavior there. It's gone from "it's hard to get it wrong" (you needed to know a lot about your platform to write code that breaks going from -O0 to -O3) to "it's hard to get it right" (you need to know a lot about the C language to avoid writing code that breaks going from -O0 to -O2).