Question

Suppose that we write in C the following character constant:

  '\xFFFFAA'  

Which is its numerical value?

The standard C99 says:

  • Character constants have type int.
  • Hexadecimal character constants can be represented as an unsigned char.
  • The value of a basic character constant is non-negative.
  • The value of any character constant fits in the range of char.

Besides:

  • The range of values of signed char is contained in the range of values of int.
  • The size (in bits) of char, unsigned char and signed char are the same: 1 byte.
  • The size of a byte is given by CHAR_BIT, whose value is at least 8.

Let's suppose that we have the typical situation with CHAR_BIT == 8.
Also, let's suppose that char is signed char for us.

By following the rules: the constant '\xFFFFAA' has type int, but its value can be represented in an unsigned char, althoug its real value fits in a char.
From these rules, an example as '\xFF' would give us:

  (int)(char)(unsigned char)'\xFF' == -1

The 1st cast unsigned char comes from the "can be represented as unsigned char" requirement.
The 2nd cast char comes from the "the value fits in a char" requirement.
The 3rd cast int comes from the "has type int" requirement.

However, the constant '\xFFFFAA' is too big, and cannot be "represented" as unsigned int.
Wich is its value?

I think that the value is the resulting of (char)(0xFFFFAA % 256) since the standard says, more or less, the following:

  • For unsigned integer types, if a value is bigger that the maximum M that can be represented by the type, the value is the obtained after taking the remainder modulo M.

Am I right with this conclusion?

EDIT I have convinced by @KeithThompson: He says that, according to the standards, a big hexadecimal character constant is a constraint violation.
So, I will accept that answer.

However: For example, with GCC 4.8, MinGW, the compiler triggers a warning message, and the program compiles following the behaviour I have described. Thus, it was considered valid a constant like '\x100020' and its value was 0x20.

Was it helpful?

Solution

The C standard defines the syntax and semantics in section 6.4.4.4. I'll cite the N1570 draft of the C11 standard.

Paragraph 6:

The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape sequence are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numerical value of the hexadecimal integer so formed specifies the value of the desired character or wide character.

Paragraph 9:

Constraints

The value of an octal or hexadecimal escape sequence shall be in the range of representable values for the corresponding type:

followed by a table saying that with no prefix, the "corresponding type" is unsigned char.

So, assuming that 0xFFFFAA is outside the representable range for type unsigned char, the character constant '\xFFFFAA' is a constraint violation, requiring a compile-time diagnostic. A compiler is free to reject your source file altogether.

If your compiler doesn't at least warn you about this, it's failing to conform to the C standard.

Yes, the standard does say that unsigned types have modular (wraparound) semantics, but that only applies to arithmetic expressions and some conversions, not to the meanings of constants.

(If CHAR_BIT >= 24 on your system, it's perfectly valid, but that's rare; usually CHAR_BIT == 8.)

If a compiler chooses to issue a mere warning and then continue to compile your source, the behavior is undefined (simply because the standard doesn't define the behavior).

On the other hand, if you had actually meant 'xFFFFAA', that's not interpreted as hexadecimal. (I see it was merely a typo, and the question has been edited to correct it, but I'm going to leave this here anyway.) Its value is implementation-defined, as described in paragraph 10:

The value of an integer character constant containing more than one character (e.g., 'ab'), ..., is implementation-defined.

Character constants containing more than one character are a nearly useless language feature, used by accident more often than they're used intentionally.

OTHER TIPS

Yes, the value of \xFFFFAA should be representable by unsigned char.

6.4.4.4 9 Constraints

The value of an octal or hexadecimal escape sequence shall be in the range of representable values for the type unsigned char for an integer character constant.

But C99 also says,

6.4.4.4 10 Semantics

The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

So the resulting value should be in the range of unsigned char([0, 255], if CHAR_BIT == 8). But as to which one, it depends on the compiler, architecture, etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top