K&R C Programming Language 1.5.1 (File Copying) [duplicate]

https://stackoverflow.com/questions/12110224

28-06-2021
|

Question

Well, i've read some months ago another "well know" C book(in my language), and i never learn't nothing about this. The way that K&R writes 3 chapters in 20 pages it's simply amazing, and of course that i can't expect huge explanations, but that also rises questions.

I have a question about this point 1.5.1 The book says(pag 16):

main(){
  int c;// <-- Here is the question
  c=getchar();
  while (c != EOF){
    putchar(c);
    c = getchar();
  }
}

[...] The type char is specifically meant for storing such character data, but any integer type can be used. We used int for a subtle but important reason. The problem is distinguishing the end of input from valid data. The solution is that getchar returns a distinctive value when there is no more input, a value that cannot be cinfused with any real character. This value is called EOF, for "end of file". We must declare c to be a type big enought to hold any value that getchar returns. We can't use char since c must be big enough to hold EOF in addition to any possible char. Therefore we use int.[...]

After searching google for another explanation:

EOF is a special macro representing End Of File (Linux: use CTRL+d on the keyboard to create this, Windows command: use CTRL+z (may have to be at beginning of new line, followed by RETURN)): Often EOF = -1, but implementation dependent. Must be a value that is not a valid value for any possible character. For this reason, c is of type int (not char as one may have expected).

So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way.

I also didn't undestrood how does getchar takes every character i write, and prints everything. Int type is 4bytes long, so it can take 4 characters inside a variable. But i can put any number of characters, it will read and write everything the same way. And with char, happens the same... What does really happens? Where are the values stored when there are more than 1-4 characters?

Solution

So i modified source from int to char to see what is the problem, about taking EOF values... but there is no problem. Works the same way

I happens to work the same way. It all depends on the real type of char, i.e. if it's signed or unsigned. There's also a C FAQ about this very subject. You're more likely to see the bug if your chars are unsigned.

The bug can go undetected for a long time, however, if chars are signed and if the input is all 7-bit characters.

EDIT

The last question is: char type is one byte long, and int is 4bytes long. So, char will only take one ascii character. But if i type "stack overflow is over 1byte long", the output will be "stack overflow is over 1byte long". Where is "tack overflow is over 1byte long" stored, and how does putchar, puts an entire string

Each character will be stored by c in turn. So the first time, getchar() will return s, and putchar will send it on its way. Then t will come along and so on. At no point will c store more than one character. So although you feed it a large string, it deals with it by eating one character at a time.

OTHER TIPS

Separating into two answers:

Why int and not char

Short and formal answer: if you want to be able to represent all real characters, and another non-real character (EOF), you can't use a datatype that's designed to hold only real characters.

Answer that can be understood but not entirely accurate: The function getchar() returns the ASCII code of the character it reads, or EOF.

Because -1 casted to char equals 255, we can't distinguish between the 255-character and EOF. That is,

char a = 255;
char b = EOF;
a == b // Evaluates to TRUE

but,

int a = 255;
int b = EOF;
a == b // Evaluates to FALSE

So using char won't allow you to distinguish between a character whose ASCII code is 255 (which could happen when reading from a file), and EOF.

How come you can use putchar() with an int

The function putchar() looks at its parameter, sees a number, and goes to the ASCII table and draws the glyph it sees. When you pass it an int, it is implicitly casted to char. If the number in the int fits in the char, all is good and nobody notices anything.

If you are using char to store the result of getchar(), there are two potential problems, which one you'll meet depend on the signedness of char.

if char is unsigned, c == EOF will never be true and you'll get an infinite loop.
if char is signed, c == EOF will be true when you input some char. Which will depend on the charset used; in locale using ISO8859-1 or CP852 it is 'ÿ' if EOF is -1 (the most common value). Some charset, for instance UTF-8, don't use the value (char)EOF in valid codes, but you rarely can guarantee than your problem will stay on signed char implementation and only be used in non problematic locales.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow