Question

I know the following code is broken --getchar() returns an int not a char--

#include <stdio.h>
int
main(int argc, char* argv[])
{
  char single_byte = getchar();

  while (single_byte != EOF) {
    single_byte = getchar();
    printf("getchar() != EOF is %d.\n", single_byte != EOF);
    if (single_byte == EOF)
      printf("EOF is implemented in terms of 0x%x.\n", single_byte);
  }

  return 0;
}

though I would expect that a typical output of it (using /dev/urandom as the input-stream for instance) would have been at last EOF is implemented in terms of 0xff, and not the following

$ ./silly < /dev/urandom
getchar() != EOF is 1.
getchar() != EOF is 1.
// ...
getchar() != EOF is 0
EOF is implemented in terms of 0xffffffff.

Furthermore, 0xffffffff cannot be stored into a single byte ...

Thank you in advance

Was it helpful?

Solution

I know the following code is broken --getchar() returns an int not a char--

Good!

char single_byte = getchar();

This is problematic is more than one way.

I'll assume CHAR_BIT == 8 and EOF == -1. (We know EOF is negative and of type int; -1 is a typical value -- and in fact I've never heard of it having any other value.)

Plain char may be either signed or unsigned.

If it's unsigned, the value of single_byte will be either the value of the character that was just read (represented as an unsigned char and trivially converted to plain char), or the result of converting EOF to char. Typically EOF is -1, and the result of the conversion will be CHAR_MAX, or 255. You won't be able to distinguish between EOF and an actual input value of 255 -- and since /dev/urandom returns all byte values with equal probability (and never runs dry), you'll see a 0xff byte sooner or later.

But that won't terminate your input loop. Your comparison (single_byte == EOF) will never be true; since single_byte is of an unsigned type in this scenario, it can never be equal to EOF. You'll have an infinite loop, even when reading from a finite file rather than from an unlimited device like /dev/urandom. (You could have written (single_byte == (char)EOF), but of course that would not solve the underlying problem.)

Since your loop does terminate, we can conclude that plain char is signed on your system.

If plain char is signed, things are a little more complicated. If you read a character in the range 0..127, its value will be stored in single_byte. If you read a character in the range 128..255, the int value is converted to char; since char is signed and the value is out of range, the result of the conversion is implementation-defined. For most implementations, that conversion will map 128 to -128, 129 to -127, ... 255 to -1. If getchar() returns EOF, which is (typically) -1, the conversion is well defined and yields -1. So again, you can't distinguish between EOF and an input character with the value -1.

(Actually, as of C99, the conversion can also raise an implementation-defined signal. Fortunately, as far as I know, no implementations actually do that.)

if (single_byte == EOF)
    printf("EOF is implemented in terms of 0x%x.\n", single_byte);

Again, this condition will be true either if getchar() actually returned EOF or if you just read a character with the value 0xff. The %x format requires an argument of type unsigned int. single_byte is of type char, which will almost certainly be promoted to int. Now you can print an int value with an unsigned int format if the value is within the representable range of both types. But since single_byte's value is -1 (it just compared equal to EOF), it's not in that range. printf, with the "%x" format, assumes that the argument is of type unsigned int (this isn't a conversion). And 0xffffffff is the likely result of taking a 32-bit int value of -1 and assuming that it's really an unsigned int.

And I'll just note that storing the result of getchar() in an int object would have been a whole lot easier than analyzing what happens when you store it in a char.

OTHER TIPS

End-of-File is a macro definition of type int that expands into a negative integral constant expression (generally, -1).

EOF is not a real character so in order to allow the result of getchar() return either a valid character or an EOF, it uses a hack whereas the return type is int. You have to cast it to char after you make sure it is not an EOF.

This is a textbook example of poorly designed API.

It appears to be a confusion between (char) -1 and (int) -1.

getchar() returns an int with 1 of 257 different values: 0 to 255 and EOF. EOF is less than 0 (C11 7.21.1).

Typically EOF has the value of -1 and that is so in your case. Let's assume that for the following.

From time to time, when data is read from /dev/urandom, a value of 255 is read. This is not the EOF.

Given that OP performs char single_byte = getchar(), single_byte takes on the same value of (char) -1 if (int) -1 (EOF) was read or if (int) 255 was read.

When next comparing single_byte != EOF, should the result be false, we do not know if original return value of getchar() was -1 or 255.

Recommend a different printf()

printf("single_byte==EOF, so (int) 255 or EOF was read: 0x%hhx\n", single_byte);

Assumptions:
char is 8 bits.
EOF is -1.

EOF values are
EOF => %d => -1
EOF => %c => <prints blank space but not blank space>
EOF => %x => 0xFFFFFFFF

no ascii value for EOF! so basically you cannot compare the getchar() output with EOF. Reason is when you leave blank space and press enter ASCII value of a blank space is 0x20 (32 in decimal), If you press enter then ASCII of carriage return in 0x0D (13 in decimal).

So that piece of code will not work! either you have to define a value to exit the code!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top