There are some as yet unexplored problems with the initial input loop.
You should never risk overflowing a buffer, even if you allocate 9001 bytes for it. That's how viruses and things break into programs. Also, you have a problem because you are comparing a character with EOF. Unfortunately, getchar()
returns an int
: it has to because it returns any valid character value as a positive value, and EOF as a negative value (usually -1, but nothing guarantees that value).
So, you might write that loop more safely, and clearly, as:
char *end = doc + sizeof(doc) - 1;
int c;
while (rp < end && (c = getchar()) != EOF)
*rp++ = c;
*rp = '\0';
With your loop as written, one of two undesirable things happens:
- if
char
is an unsigned type, then you will never detect EOF.
- if
char
is a signed type, then you will detect EOF when you read a valid character (often ÿ, y-umlaut, LATIN SMALL LETTER Y WITH DIAERESIS, U+00FF).
Neither is good. The code above avoids both problems without needing to know whether plain char
is signed or unsigned.
Conventionally, if you have an empty loop body, you emphasize this by placing the semicolon on a line on its own. Many an infinite loop has been caused by a stray semicolon after a while
condition; by placing the semicolon on the next line, you emphasize that it is intentional, not accidental.
while ((*(rp++) = getchar()) != EOF);
while ((*(rp++) = getchar()) != EOF)
;