About the unbuffered I/O in UNIX Systems

Question 1

There are several kinds of buffering going on. The input to the program is buffered by the pseudoterminal device's Line-buffering discipline. On the output side, there is a file-system cache (a buffer in the OS for the whole file), and extra buffering in the C program when printing to a FILE * type. But read and write bypass the FILE * buffering and move data more or less directly to/from the file-system cache.

So it appears that your stdout buffer is being flushed automatically when all output is going to the terminal, but not when redirected to a file. So I'd recommend adding a call to

fflush(stdout);

after the printf call. This should explicitly flush the buffer (and enforce the ordering of the output that you want).

The important thing to be aware of is when you're using FILE *s which are a C-level structure manipulated by library functions (like fopen), and when you're using the raw file descriptor (which is just an integer, but refers to the underlying operating-system file). The FILE datatype is a wrapper around this lower level Unix implementation detail. The FILE functions implement an additional layer of buffering so the lower level can operate on larger blocks, and you can efficiently perform byte-at-a-type processing without doing lots and lots I/O handshakes.

Question 2

write() is not buffered. printf()ing to stdout is buffered, but in a way depending to where the output goes.

If stdout's output goes to the console it's line buffered, if not it's fully buffered, which in your second example leads to being flushed on the program's end, whereas the outputs from the calls to write() go out immediately.

From man stdio:

[...] the standard input and output streams are fully buffered if and only if the streams do not refer to an interactive device.

Output streams that refer to terminal devices are always line buffered by default;

Question 3

First a solution, change read and write to fread and fwrite:

#include <apue.h>
#define BUFFSIZE 4096
int main()
{
    int n;
    char buf[BUFFSIZE];
    while((n = fread(buf, 1, BUFFSIZE, stdin)) > 0)
    {
        printf("n is %d\n", n);          //this line is added by me for testing
        if(fwrite(buf, 1, n, stdout) != n) {
            // note: if err_sys depend on errno, it may print wrong error
            err_sys("write error");
        }
    }

    if(ferror(stdin)) {
        // note: if err_sys depend on errno, it may print wrong error
        err_sys("read error");
    }
    exit(0);
}

Things to note about the code:

Here using fread is optional, because you do not read from stdio otherwise.
fread and fwrite take element size and number of elements to determine how much should be written. Partial elements will not be read, so element size 1 (not count 1) is what is usually wanted with text.
There are differences in error handling and return values, and setting of errno by stdio functions is not very well defined, see here for more.

Finally short explanation: stdio input and output is buffered. Lower level file descriptor IO (open and close, read and write etc) is not buffered and completely bypasses stdio buffering. These should not be mixed on same file, because it's easy to get mixed up on buffering details even if you try to do it so that it "should" work. Even if you get it to work on your OS, it may break when compiled for different OS and libraries. So just don't do it, instead use one or the other for the same open file.