Whether in C or C++, I think that this illegal program, whose behavior according to the C or C++ standard is undefined, is interesting:
#include <stdio.h>
int foo() {
int a;
const int b = a;
a = 555;
return b;
}
void bar() {
int x = 123;
int y = 456;
}
int main() {
bar();
const int n1 = foo();
const int n2 = foo();
const int n3 = foo();
printf("%d %d %dn", n1, n2, n3);
return 0;
}
Output on my machine (after compilation without optimization):
123 555 555
I think that this illegal program is interesting because it illustrates stack mechanics, because the very reason one uses C or C++ (instead of, say, Java) is to program close to the hardware, close to stack mechanics and the like.
However, on StackOverflow, when a questioner’s code inadvertently reads from uninitialized storage, the most heavily upvoted answers invariably quote the C or C++ (especially C++) standard to the effect that the behavior is undefined. This is true, of course, as far as the standard goes—the behavior is indeed undefined—but it is curious that alternate answers that try, from a hardware or stack-mechanical perspective, to investigate why a specific undefined behavior (such as the output above) might have occurred, are rare and tend to be ignored.
I even remember one answer that suggested that undefined behavior could include reformatting my hard drive. I didn’t worry too much about that, though, before running the program above.
My question is this: Why is it more important to teach readers merely that behavior is undefined in C or C++, than it is to understand the undefined behavior? I mean, if the reader understood the undefined behavior, then would he not be the more likely to avoid it?
My education happens to be in electrical engineering, and I work as a building-construction engineer, and the last time I had a job as a programmer per se was 1994, so I am curious to understand the perspective of users with more conventional, more recent software-development backgrounds.
16
Frama-C’s value analysis, a static analyzer the purported goal of which is to find all undefined behaviors in a C program, considers the assignment const int b = a;
as okay. This is a deliberate design decision in order to allow memcpy()
(typically implemented as a loop over unsigned char
elements of a virtual array, and that the C standard arguably allows to re-implement as such) to copy a struct
(which can have padding and uninitialized members) to another.
The “exception” is only for lvalue = lvalue;
assignments without an intervening conversion, that is, for an assignment that amounts to a copy of a slice of memory for a memory location to another.
I (as one of the authors of Frama-C’s value analysis) discussed this with Xavier Leroy at a time when he was himself wondering about the definition to pick in the verified C compiler CompCert, so he may have ended up using the same definition. It is in my opinion cleaner than what the C standard tries to do with indeterminate values that can be trap representations, and the type unsigned char
that is guaranteed not to have any trap representations, but both CompCert and Frama-C assume relatively non-exotic targets, and perhaps the standardization committee was trying to accommodate platforms where reading an uninitialized int
can indeed abort the program.
Returning b
, or passing n1
, n2
or n3
to printf
in the end at least can be considered undefined behavior, because copying an uninitialized slice of memory does not making it initialized. With an oldish Frama-C version:
$ frama-c -val t.c
…
t.c:19:… accessing uninitialized left-value: assert initialized(&n1);
And in an oldish version of CompCert, after minor modifications to make the program acceptable to it:
$ ccomp -interp t.c
Time 33: in function foo, expression <loc> = <undef>
ERROR: Undefined behavior
Undefined behavior ultimately means the behavior is non-deterministic. Programmers who are unaware that they are writing non-deterministic code are just bad ignorant programmers. This site aims to make programmers better (and less ignorant).
Writing a correct program in the face of non-deterministic behavior is not impossible. However, it is a specialized programming environment, and requires a different kind of programming discipline.
Even in your example, if the program receives an externally raised signal, the values on the “stack” may change in such a way that you don’t get the expected values. Moreover, if the machine has trap values, reading random values may very well cause something strange to happen.
13
Why is it more important to teach readers merely that behavior is undefined in C or C++, than it is to understand the undefined behavior?
Because the specific behavior may not be repeatable, even from run to run without rebuilding.
Chasing down exactly what happened may be a useful academic exercise for better understanding the quirks of your particular platform, but from a coding perspective the only relevant lesson is “don’t do that”. An expression like a++ * a++
is a coding error, full stop. That’s really all anyone needs to know.
“Undefined Behavior” is shorthand for “This behavior is not deterministic; not only will it probably behave differently in different compilers or hardware platforms, it may even behave differently in different versions of the same compiler.”
Most programmers would consider this an undesirable characteristic, especially since C and C++ are standards-based languages; that is, you use them, in part, because the language specification makes certain guarantees about how the language will behave, if you are using a standards-compliant compiler.
As with most things in programming, you have to weight the advantages and disadvantages. If the benefit of some operation that is UB exceeds the difficulty of getting it to behave in a stable, platform-agnostic fashion, then by all means, use the undefined behavior. Most programmers will think it is not worth it, most of the time.
The remedy for any undefined behavior is to examine the behavior that you actually get, given a particular platform and compiler. That sort of examination is not one that an expert programmer is likely to explore for you in a Q&A setting.
5
If the documentation for a particular compiler says what it will do when code does something which is considered “Undefined Behavior” by the standard, then code which relies upon that behavior will work correctly when compiled with that compiler, but may behave in arbitrary fashion when compiled using some other compiler whose documentation does not specify the behavior.
If the documentation for a compiler does not specify how it will handle some particular “undefined behavior”, the fact that a program’s behavior seems to obey certain rules says nothing about how any similar programs will behave. Any variety of factors may cause a compiler to emit code which handles unexpected situations differently–sometimes in seemingly-bizarre fashion.
Consider, for example, on a machine where int
is a 32-bit integer:
int undef_behavior_example(uint16_t size1, uint16_t size2)
{
int flag = 0;
if ((uint32_t)size1 * size2 > 2147483647u)
flag += 1;
if (((size1*size2) & 127) != 0) // Test whether product is a multiple of 128
flag += 2;
return flag;
}
If size1
and size2
were both equal to 46341 (their product is 2147488281) one might expect that the function would return 3, but a compiler could legitimately skip the first test entirely; either the product would be small enough that the condition would be false, or the upcoming multiplication would overflow and relieve the compiler of any requirement to do, or have done, anything. While such behavior may seem bizarre, some compiler authors seem to take great pride in their compilers’ abilities to eliminate such “unnecessary” tests. Some people might expect that an overflow on the second multiply would, at worst, cause the all bits of that particular product to be arbitrarily corrupted; in fact, however, in any case where a compiler can determine that overflow either must have occurred or would be inevitable before the next sequenced observable side-effect, a compiler would be free to do anything it likes.
4