The C standard states that any character in the machine’s standard printing set will never be negative.But the following code:
char c=1234;
printf("%dt%c",c,c);
gives a negative output and prints a strange character,meaning that the character is in machine’s printing character set.Is the C standard being violated here?
9
The part of the C standard (link is to a late draft of C11) you’re interested in is section 6.3.1.3 (page 69 of the PDF), which says:
6.3 Conversions
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than_Bool
, if the value can be represented by the new type, it
is unchanged.2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.3 Otherwise, the new type is signed and the value cannot
be represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Your first statement requires the conversion of an int
(signed or unsigned, doesn’t matter in this case) to a signed char
. Because the new type is signed and can’t represent 1234
, the behavior described in item 3 is triggered.
Indeed, if you have warnings enabled on when you compile the code, you should get an error like this one from gcc:
junk.c:4: warning: overflow in implicit constant conversion
2
The C standard states that any character in the machine’s standard printing set will never be negative.
Where? C++ has that constraint for characters in the basic execution character set (and not all printing characters, see 2.2/3 in C++98, 2.3/11 in C++11), but in the past I’ve looked for a similar clause for C and didn’t found it.
Even for C++, if the character printed is a “strange character”, that means it is probably not in the basic character set and thus the behavior by itself gives no hint about conformance. BTW, even if the glyph was similar to one in the basic character set, that wouldn’t mean much, a ‘A’ for instance could be intended as an upper case alpha.
Well, I am not an expert on this, but the key is what the C standard means by “machine’s standard printing set”. On most modern machines/systems this will be 7-bit ASCII. So as long as you assign just characters of that range to a char
variable, you can be sure that, when interpreted as an integer, you get a positive value.
The (full) printing set of a machine may contain a lot more characters, but when you, for example, assign an umlaut like ä, ö or ü to a character variable, you have to expect on some machines the integer conversion of that variable becoming <0, on others >0, this is not defined in the standard. Furthermore, there is no guarantee that such a character will always be printed as the same letter, this may, for example, depend on the activate codepage or the environment of the program.