Question

Is using fseek to backtrack character fscanf operations reliable?

Like for example if I have just fscanf-ed 10 characters but I would like to backtrack the 10 chars can I just fseek(infile, -10, SEEK_CUR) ?

For most situations it works but I seem to have problems with the character ^M. Apparently fseek registers it as a char but fscanf doesn't register it, thus in my previous example a 10 char block containing a ^M would require fseek(infile, -11, SEEK_CUR) instead. fseek(infile, -10, SEEK_CUR) would make bring it short by 1 character.

Why is this so?

Edit: I was using fopen in text mode

Was it helpful?

Solution

You're seeing the difference between a "text" and a "binary" file. When a file is opened in text mode (no 'b' in the fopen second argument), the stdio library may (indeed, must) interpret the contents of the file according to the operating system's conventions for text files. For example, in Windows, a line ends with \r\n, and this gets translated to a single \n by stdio, since that is the C convention. When writing to a text file, a single \n gets output as \r\n.

This makes it easier to write portable C programs that handle text files. Some details become complicated, however, and fseeking is one of them. Because of this, the C standard only defines fseek in text files in a few cases: to the very beginning, to the very end, to the current position, and to a previous position that has been retrieved with ftell. In other words, you can't compute a location to seek to for text files. Or you can, but you have to take care of the all the platform-specific details yourself.

Alternatively, you can use binary files and do the line-ending transformations yourself. Again, portability suffers.

In your case, if you just want to go back to where you last did fscancf, the easiest would be to use ftell just before you fscanf.

OTHER TIPS

This is because fseek works with bytes, whereas fscanf intelligently handles that the carriage return and line feed are two bytes, and swallows them as one char.

Fseek has no understanding of the file's contents and just moves the filepointer 10 characters back.

fscanf depending on the OS, may interpret newlines differently; it may even be so that fscanf will insert the ^M if you're on DOS and the ^M does not appear in the file. Check your manual that came with your C compiler

Just tried this with VS2008 and found that fscanf and fseek treated the CR and LF characters in the same way (as a single character).

So with two files:

0000000: 3132 3334 3554 3738 3930 3132 3334 3536 12345X7890123456

and

0000000: 3132 3334 350d 0a37 3839 3031 3233 3435 12345..789012345

If I read 15 characters I get to the second '5', then seek back 10 characters, my next character read is the 'X' in the first case and the CRLF in the second.

This seems like a very OS/compiler specific problem.

Did you test the return value of fscanf? Post some code.

Take a look at ungetc. You may have to run a loop over it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top