Question

I'm looking at using the read() function to read in entire data structures, each of which will be of the same type as any other, but with different data, then place them into a linked list. For some reason I can't seem to find any specific information on how to terminate a loop which would include read(fp, &tmp, sizeof(struct foo)) followed by new_node(tmp).

I would like to be able to simply read until EOF, but I'm not aware of how to go about determining with the read() function where EOF is. Obviously, I could use a workaround with the write() function where I would include the number of structures in the file before writing, and then terminate the read function when I reach that number, but that seems a bit clunky, and avoids the original idea of knowing when the file terminates.

FOLLOW-UP:

I appreciate the assistance, and I've implemented what I've seen. Unfortunately, I believe I may be reading in the wrong information. The pertinent code:

struct test_t{
    int data;
    char buf[LEN];
        struct test_t * next;
};

struct test_t * new_node(struct test_t node, struct test_t * tail)
{
    struct test_t * tmp = NULL;

    if(!(tmp = malloc(sizeof(struct test_t))))
        return NULL;

    tmp->data = node.data;
    strcpy(tmp->buf, node.buf);
    tmp->next = NULL;
    if(tail)
        tail->next = tmp;

    return tmp;
}

...

while(read(fd, &tmp, sizeof(struct test_t)) == sizeof(struct test_t)){
    printf("%d, %s\n", tmp.data, tmp.buf);
    tail = new_node(tmp, tail);
    if(head == NULL)
        head = tail;
    printf("%d, %s\n", tail->data, tail->buf);
}

...

fd = open("test.txt", O_WRONLY | O_CREAT, 0666);
iter = head;
while(iter){
    printf("%d\n", write(fd, &iter, sizeof(struct test_t)));
    printf("%d, %s\n", iter->data, iter->buf);
    iter = iter->next;
}

This is the output from the write loop:

112
1, a
112
2, b
112
3, c
112
4, d
112
5, e

The file is saved in binary, but I can make out enough to know that only the tail seems to be written, five times. I'm not sure why that is.

The output for the diagnostic printf's in the read loop is:

23728144, 
23728144, 
23728272, 
23728272, 
23728400, 
23728400, 
23728528, 
23728528, 
23728656, 
23728656,

The output makes me think it's putting the value of the next pointer into the data int. Any idea why: 1) I might be write()ing the same node five times in a row? 2) I am getting gibberish when I read()?

Was it helpful?

Solution

while (read(fd, &tmp, sizeof(tmp)) == sizeof(tmp))
{
    ...got another one...
}

It is conventional to use FILE *fp; and int fd; (so the name for a file descriptor is fd and not fp).

The read() function returns the number of bytes it read. If there's no more data, it returns 0. For disk files and the like, it will return the requested number of bytes (except at the very end when there might not be that many bytes left to read) or 0 when there's no data left to read (or -1 if there's an error on the device rather than just no more data). For terminals (and sockets, and pipes), it will read as many bytes as are available rather than wait for the requested size (so each read could return a different size). The code shown only reads full-size structures and baulks if it gets a short read, EOF or an error.


The code by ensc in his answer covers all practical circumstances, but isn't the way I'd write the equivalent loop. I'd use:

struct foo tmp;
ssize_t nbytes;

while ((nbytes = read(fd, &tmp, sizeof(tmp))) != 0)
{
    if ((size_t)nbytes = sizeof(tmp))
        process(&tmp);
    else if (nbytes < 0 && errno == EINTR)
        continue;
    else if (nbytes > 0)
        err_syserr("Short read of %zu bytes when %zu expected on fd %d\n",
                   nbytes, sizeof(tmp), fd);
    else
        err_syserr("Read failure on fd %d\n", fd);
}

The two normal cases — a full length record is read OK and EOF is detected — are handled at the top of the loop; the esoteric cases are handled further down the loop. My err_syserr() function is printf()-like and reports the error given by its arguments, and also the error associated with errno if it is non-zero, and then exits. You can use any equivalent mechanism. I might or might not put the file descriptor number in the error message; it depends on who is going to see the errors. If I knew the file name, I'd certainly include that in the message in preference to the file descriptor.

I don't see any difficulty handling the nbytes == -1 && errno == EINTR case, contrary to comments by @ensc.

OTHER TIPS

read returns the number of bytes read. If you perform a read, and the return value is less than the number of bytes you requested, then you know it reached EOF during that read. If it exactly equals the requested number of bytes, then either the file has not reached EOF, or it did, and there are exactly 0 bytes left in the file, in which case the next call to read() will return 0.

while(read(fd, &tmp, sizeof(tmp)) > 0) {
    ...
}

Ignoring error conditions, I think this is the basic idea:

while (read(fp, &tmp, sizeof(struct foo))==sizeof(struct foo))
    new_node(tmp);
for (;;) {
    struct foo tmp;
    ssize_t l = read(fd, &tmp, sizeof tmp);

    if (l < 0 && errno == EINTR) {
        continue;
    } else if (l < 0) {
        perror("read()");
        abort();
    } else if (l == 0) {
        break;   /* eof condition */
    } else if ((size_t)(l) != sizeof tmp) {
        abort(); /* something odd happened */
    } else {
        handle(&tmp);
    }
}

EDIT:

In my projects I use a generic

bool read_all(int fd, void *dst_, size_t len, bool *is_err)
{
        unsigned char *dst = dst_;

        *is_err = false;

        while (len > 0) {
                ssize_t l = read(fd, dst, len);

                if (l > 0) {
                        dst += l;
                        len -= l;
                } else if (l == 0) {
                        com_err("read_all", 0, "read(): EOF");
                        *is_err = (void *)dst != dst_;
                        break;
                } else if (errno == EINTR) {
                        continue;
                } else {
                        com_err("read_all", errno, "read()");
                        *is_err = true;
                        break;
                }
        }

        return len == 0;
}

function. Because I prefer the approach to say how much elements are to be read, an EOF is handled as an error here. But it would be trivial to add another bool *err argument to the function which is set in the non-EOF error case. You can use above as

while (read_all(fd, &tmp, sizeof tmp, &is_err))
    new_node(&tmp);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top