سؤال

I have this code to read a file using mmap and print it using printf. The file has 10 lines, and contains nos 0-9 on each line.

My questions are:
1. Why my code doesn't terminate on EOF ? i.e. why doesn't it stop at while (data[i]!=EOF) ?
2. When I run it with while (data[i]!=EOF), the program always terminates at data[10567] ? where as the page size is 4096 bytes. Does 10567 bytes have any significance ?

Edit: I am not looking for alternative like using fscanf, fgets.

Thanks!

Code:

 10 int main(int argc, char *argv[])
 11 {
 12         FILE *ifp, *ofp;
 13         int pagesize, fd, i=0;
 14         char *data;
 15         struct stat sbuf;
 16
 18         if ((ifp = fopen("/home/t/workspace/lin", "r"))==NULL)
 19         {
 20                 fprintf(stderr, "Can't open input file\n");
 21                 exit(1);
 22         }
 28         fd  = fileno(ifp);
 29         if (stat("/home/t/workspace/lin", &sbuf) == -1)
 30         {
 31                 perror("stat");
 32                 exit(1);
 33         }
 34         pagesize = getpagesize();
 35         printf("page size: %d\n", pagesize);
 36         printf("file size: %d\n", sbuf.st_size);
 37         if((data = mmap((caddr_t)0, sbuf.st_size, PROT_READ, MAP_SHARED, fd, 0)) == (caddr_t)(-1))
 38         {
 39                 perror("mmap");
 40                 exit(1);
 41         }
 43         //while (data[i]!=EOF)
 44         while (i<=sbuf.st_size)
 45         {
 46                 printf("data[%d]=%c\n", i, data[i]);
 47                 i++;
 48         }
 50         return 0;
 51 }

Output:

page size: 4096
file size: 21
data[0]=0
data[1]=

data[2]=1
data[3]=

data[4]=2
data[5]=

data[6]=3
data[7]=

data[8]=4
data[9]=

. . . .

data[18]=9
data[19]=

data[20]=

data[21]=  // truncated my output here, 
           // it goes till data[10567] if I use `while (data[i]!=EOF)`
هل كانت مفيدة؟

المحلول

EOF is not stored in files. So there's no point comparing a byte from the file with EOF. If you use mmap, as opposed to getchar or equivalent, then you need to stat the file to find out how big it is.

Note that getc, fgetc and getchar return an int. Quoting the manpage (or the Posix standard), these functions return the next byte "as an unsigned char cast to an int, or EOF on end of file or error." The value of EOF must be such that it cannot be confused with "an unsigned char cast to an int"; typically, it is -1. It is possible for a random (signed) char to be equal to -1, so your test data[i]!=EOF may eventually turn out to be true as you scan through uninitialized memory, if you don't segfault before you hit the random byte.

In Unix, text files are not necessarily terminated with NULs either. In short, you should only try to reference bytes you know to be inside the file, based on the file's size.

نصائح أخرى

You output looks correct. The only bug I see is that:

   while (i<=sbuf.st_size)

should have <.

There is no EOF, such as a Control-Z, stored in the actual data. All standard functions such as getc will return EOF when their internal counter equivalent to your i is past but their own sbuf.st_size. That is to say, EOF is a fictitious character generated by getc and/or the OS.

The confusion perhaps arises because, if I recall correctly, MS-DOS text files actually contain a ^Z, and if you inadvertently fopen one in binary mode, you can see this unwanted ^Z. Unix does not have this distinction.

With respect to your question:

Does 10567 bytes have any significance ?

I would say no. My guess is that data[10567] happens to be the first byte of memory equal to 0xFF, which is promoted to -l (assuming your char is signed), which matches EOF.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top