Question

In C or C++, the following can be used to return a file size:

const unsigned long long at_beg = (unsigned long long) ftell(filePtr);
fseek(filePtr, 0, SEEK_END);
const unsigned long long at_end = (unsigned long long) ftell(filePtr);
const unsigned long long length_in_bytes = at_end - at_beg;
fprintf(stdout, "file size: %llu\n", length_in_bytes);

Are there development environments, compilers, or OSes which can return the wrong file size from this code, based on padding or other information that is situation-specific? Were there changes in the C or C++ specification around 1999, which would have lead to this code no longer working in certain cases?

For this question, please assume I am adding large file support by compiling with the flags -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1. Thanks.

Was it helpful?

Solution

It won't work on unseekable files like /proc/cpuinfo or /dev/stdin or /dev/tty, or pipe files gotten with popen

And it won't work if that file is written by another process at the same time.

Using the Posix stat function is probably more efficient and more reliable. Of course, this function might not be available on non Posix systems.

OTHER TIPS

The fseek and ftell functions are both defined by the ISO C language standard.

The following is from latest public draft of the 2011 C standard, but the 1990, 1999, and 2011 ISO C standards are all very similar in this area, if not identical.

7.21.9.4:

The ftell function obtains the current value of the file position indicator for the stream pointed to by stream. For a binary stream, the value is the number of characters from the beginning of the file. For a text stream, its file position indicator contains unspecified information, usable by the fseek function for returning the file position indicator for the stream to its position at the time of the ftell call; the difference between two such return values is not necessarily a meaningful measure of the number of characters written or read.

7.21.9.2:

The fseek function sets the file position indicator for the stream pointed to by stream. If a read or write error occurs, the error indicator for the stream is set and fseek fails.

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position specified by whence. The specified position is the beginning of the file if whence is SEEK_SET, the current value of the file position indicator if SEEK_CUR, or end-of-file if SEEK_END. A binary stream need not meaningfully support fseek calls with a whence value of SEEK_END.

For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.

Violating any of the "shall" clauses makes your program's behavior undefined.

So if the file was opened in binary mode, ftell gives you the number of characters from the beginning of the file -- but an fseek relative to the end of the file (SEEK_END) is not necessarily meaningful. This accommodates systems that store binary files in whole blocks and don't keep track of how much was written to the final block.

If the file was opened in text mode, you can seek to the beginning or end of the file with an offset of 0, or you can seek to a position given by an earlier call to ftell; fseek with any other arguments has undefined behavior. This accomodates systems where the number of characters read from a text file doesn't necessarily correspond to the number of bytes in the file. For example, on Windows reading a CR-LF pair ("\r\n") reads only one character, but advances 2 bytes in the file.

In practice, on Unix-like systems text and binary modes behave the same way, and the fseek/ftell method will work. I suspect it will work on Windows (my guess is that ftell will give the byte offset, which may not be the same as the number of times you could call getchar() in text mode).

Note also that ftell() returns a result of type long. On systems where long is 32 bits, this method can't work for files that are 2 GiB or larger.

You might be better off using some system-specific method to get the size of a file. Since the fseek/ftell method is system-specific anyway, such as stat() on Unix-like systems.

On the other hand, fseek and ftell are likely to work as you expect on most systems you're likely to encounter. I'm sure there are systems where it won't work; sorry, but I don't have specifics.

If working on Linux and Windows is good enough, and you're not concerned with large files, then the fseek/ftell method is probably ok. Otherwise, you should consider using a system-specific method to determine the size of a file.

And keep in mind that anything that tells you the size of a file can only tell you its size at that moment. The file's size could change before you access it.

1) Superficially, your code looks "OK" - I don't see any problem with it.

2) No - there isn't any "C or C++ specification" that would affect fseek. There is a Posix specification:

3) If you want "file size", my first choice would probably by "stat()". Here's the Posix specification:

4) If something's "going wrong" with your method, then my first guess would be "large file support".

For example, many OS's had parallel "fseek()" and "fseek64()" APIs.

'Hope that helps .. PSM

POSIX defines the return value from fseek as "measured in bytes from the beginning of the file". Your at_beg will always be zero (assuming this is a newly opened file).

So, assuming that:

  1. the file is seekable
  2. there are no concurrency issues to be concerned about
  3. the file size is representable in the data type used by the fseek/ftell variant you choose

then your code should work on any POSIX-compliant system.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top