File Handling question on C programming

https://stackoverflow.com/questions/739589

c
filehandle

09-09-2019
|

Question

I want to read line-by-line from a given input file,, process each line (i.e. its words) and then move on to other line...

So i am using fscanf(fptr,"%s",words) to read the word and it should stop once it encounters end of line...

but this is not possible in fscanf, i guess... so please tell me the way as to what to do...

I should read all the words in the given line (i.e. end of line should be encountered) to terminate and then move on to other line, and repeat the same process..

Solution

Use fgets(). Yeah, link is to cplusplus, but it originates from c stdio.h.

You may also use sscanf() to read words from string, or just strtok() to separate them.

In response to comment: this behavior of fgets() (leaving \n in the string) allows you to determine if the actual end-of-line was encountered. Note, that fgets() may also read only part of the line from file if supplied buffer is not large enough. In your case - just check for \n in the end and remove it, if you don't need it. Something like this:

// actually you'll get str contents from fgets()
char str[MAX_LEN] = "hello there\n";
size_t len = strlen(str);
if (len && str[len-1] == '\n') {
    str[len-1] = 0;
}

Simple as that.

OTHER TIPS

If you are working on a system with the GNU extensions available there is something called getline (man 3 getline) which allows you to read a file on a line by line basis, while getline will allocate extra memory for you if needed. The manpage contains an example which I modified to split the line using strtok (man 3 strtrok).

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE * fp;
    char * line = NULL;
    size_t len = 0;
    ssize_t read;

    fp = fopen("/etc/motd", "r");
    if (fp == NULL)
    {
        printf("File open failed\n");
        return 0;
    }

    while ((read = getline(&line, &len, fp)) != -1) {
        // At this point we have a line held within 'line'
        printf("Line: %s", line);
        const char * delim = " \n";
        char * ptr; 
        ptr = (char * )strtok(line,delim);

        while(ptr != NULL)
        {
            printf("Word: %s\n",ptr);
            ptr = (char *) strtok(NULL,delim);
        }
    }

    if (line)
    {
        free(line);
    }
    return 0;
}

Given the buffering inherent in all the stdio functions, I would be tempted to read the stream character by character with getc(). A simple finite state machine can identify word boundaries, and line boundaries if needed. An advantage is the complete lack of buffers to overflow, aside from whatever buffer you collect the current word in if your further processing requires it.

You might want to do a quick benchmark comparing the time required to read a large file completely with getc() vs. fgets()...

If an outside constraint requires that the file really be read a line at a time (for instance, if you need to handle line-oriented input from a tty) then fgets() probably is your friend as other answers point out, but even then the getc() approach may be acceptable as long as the input stream is running in line-buffered mode which is common for stdin if stdin is on a tty.

Edit: To have control over the buffer on the input stream, you might need to call setbuf() or setvbuf() to force it to a buffered mode. If the input stream ends up unbuffered, then using an explicit buffer of some form will always be faster than getc() on a raw stream.

Best performance would probably use a buffer related to your disk I/O, at least two disk blocks in size and probably a lot more than that. Often, even that performance can be beat by arranging the input to be a memory mapped file and relying on the kernel's paging to read and fill the buffer as you process the file as if it were one giant string.

Regardless of the choice, if performance is going to matter then you will want to benchmark several approaches and pick the one that works best in your platform. And even then, the simplest expression of your problem may still be the best overall answer if it gets written, debugged and used.

but this is not possible in fscanf,

It is, with a bit of wickedness ;)

Update: More clarification on evilness

but unfortunately a bit wrong. I assume [^\n]%*[^\n] should read [^\n]%*. Moreover, one should note that this approach will strip whitespaces from the lines. – dragonfly

Note that xstr(MAXLINE) [^\n] reads MAXLINE characters which can be anything except the newline character (i.e. \n). The second part of the specifier i.e. *[^\n] rejects anything (that's why the * character is there) if the line has more than MAXLINE characters upto but NOT including the newline character. The newline character tells scanf to stop matching. What if we did as dragonfly suggested? The only problem is scanf will not know where to stop and will keep suppressing assignment until the next newline is hit (which is another match for the first part). Hence you will trail by one line of input when reporting.

What if you wanted to read in a loop? A little modification is required. We need to add a getchar() to consume the unmatched newline. Here's the code:

#include <stdio.h>

#define MAXLINE 255

/* stringify macros: these work only in pairs, so keep both */
#define str(x) #x
#define xstr(x) str(x)

int main() {
    char line[ MAXLINE + 1 ];
    /* 
       Wickedness explained: we read from `stdin` to `line`.
       The format specifier is the only tricky part: We don't
       bite off more than we can chew -- hence the specification 
       of maximum number of chars i.e. MAXLINE. However, this
       width has to go into a string, so we stringify it using  
       macros. The careful reader will observe that once we have
       read MAXLINE characters we discard the rest upto and
       including a newline.
     */
    int n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
    if (!feof(stdin)) {
        getchar();
    }
    while (n == 1) {
        printf("[line:] %s\n", line);
        n = fscanf(stdin, "%" xstr(MAXLINE) "[^\n]%*[^\n]", line);
        if (!feof(stdin)) {
            getchar();
        }
    } 
    return 0;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow