confusing getchar using read() under linux

https://stackoverflow.com/questions/12019447

27-06-2021
|

Question

Why do we write *bufp = buf since both are arrays so in my opinion it should be:
```
static char bufp = buf;
```
How does *bufp "know" from which position to start displaying ? It is not initialized to zero in any way. After assigning buf to bufp I'd expect that in return line it starts with the last entered char.
Is unsigned char modifier used here just to omit the case of -1 being the input - meaning EOF on most systems?

#include "syscalls.h"
/* getchar: simple buffered version */
int getchar(void)
{
    static char buf[BUFSIZ];
    static char *bufp = buf; /* [1] */
    static int n = 0;
    if (n == 0) {            /* buffer is empty */
        n = read(0, buf, sizeof buf);
        bufp = buf;          /* ? [1] here it is written like in my question so which is true ? */
    }
    return (--n >= 0) ? (unsigned char) *bufp++ : EOF; /* [2] & [3] */
}

Solution

[1] char bufp = buf is incorrect, as buf is an array of char (and is internally an address, i.e. the content of a pointer), and char bufp would declare a unique character. char *bufp, instead, is a pointer to a char (to the first char, but you can access the next ones also).

[2] bufp points to the buf array, ie its first character, at the beginning. And n is set to 0. bufp, buf and n are all static, meaning they "live" after the function returns - each of their value is initialized when the program loads, then the initialization is not performed anymore each time the function is called. Thus they "remember" the status of the buffer:

`n` is the number of characters in the buffer, ready to be returned one by one,

`bufp` points to the next character to be returned (when n > 0),

and `buf` the array just holds the characters in the buffer.

So to answer your [2] question,

when there is no character available (n == 0) a call to read fills the buffer buf and bufp points to the beginning of that array.
then as long as the buffer characters have not all been returned one by one (n > 0), *bufp is the next character to be returned ; *bufp++ gives the character to be returned and increments the bufp pointer by one.

[3] The unsigned modifier prevents the compiler to propagate the *bufp character (8 bits) sign to the int other bytes (usually 32 bits, ie the 24 most significant bits), since an int is returned. Thus any character where code would be > 127 (for unsigned chars, or negative for signed char) is returned as is (eg (unsigned char)200 is returned as (int)200).

OTHER TIPS

buf is an array, so it is not of type char, but of type char *.

char *bufp declares that bufp is also of type char *. Generically, the two approaches below are similar:

char *bufp = buf;   // Declare and assign in a single line

char *bufp;   // Declare
bufp = buf;   // Then assign

Because of the static modifier, only the first option is possible in your example.

bufp is a pointer that has the same value of buf, i.e., buf is the address of the first element in buf[BUFSIZ]. bufp also points to this address. So bufp "knows" everything that "buf" knows. You can even use something like bufp[n] as far as n is not out of range.

In summary, buf[BUFSIZ] is an array whose address is buf, or &buf[0], or bufp or &bufp[0].

EOF is not necessarily the same in all systems, hence the last line makes sure that things are consistent.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow