Flushing an FileOutputStream in Java: can the actual write be cached (postponed) by the OS itself, and File.size() return "wrong" value?

StackOverflow https://stackoverflow.com/questions/20530516

Вопрос

First of all, I apologize for the wall of text.

I am writing chunks of data to a file (using a FileOutputStream); from time to time I am flushing, and checking the length of the file (File.length()) to see whether a certain threshold has been reached. These log files need to be available for further request upon request.

The code is in a production environment, part of a rather elaborated logging system; therefore I am apologizing for the lack of code - it's not just impractical, but downright impossible to post it here. However, the trivial description above is spot-on.

My concern is that, while in 99.9% of cases everything works as expected (maybe tens of thousands of log files reviewed since the system is in use), I still get sometimes files many times above my threshold - I suspect the size of the file not being reported correctly. Before writing any big chunk of data, I flush and check the file size, which I was confident that it should force the data to be sent to disk.

(No security exceptions are thrown, file permissions are fine and are not in use somewhere else in code...in other words, can't say why the size would be reported incorrectly. I don't expect that an antivirus lock would influence the reported size of a file. The chunks of data by themselves would not the behavior, they are rather small compared to the threshold size. Of course, reproducing this issue is near impossible, and the only approach would be to...log more debugging info.)

Checking the Java docs (OutputStream.flush()), they seem to suggest that the OS might postpone the actual disk writes, which would explain why sometimes I get the described behavior. I'd be grateful if someone could explain whether I am interpreting this passage correctly:

Flushes this output stream and forces any buffered output bytes to be written out. 
The   general contract of flush is that calling it is an indication that, if any bytes 
previously written have been buffered by the implementation of the output stream, such     
bytes should immediately be written to their intended destination.

If the intended destination of this stream is an abstraction provided by the underlying 
operating system, for example a file, then flushing the stream guarantees only that   
bytes previously written to the stream are passed to the operating system for writing;  
it does not guarantee that they are actually written to a physical device such as a 
disk drive.

So far, it seems that Vista users had this issue; can't say if this is a hard rule, or not.

Is there anything else in the "grand scheme of things" that could cache or buffer data?

Это было полезно?

Решение

The platform may not update the metadata in the file until the file is closed. Windows for example does this. So regardless of whether the final writes happen physically or not, File.length() can still return an unexpected value while the file is open.

Другие советы

The OS may buffer writes on its own, and the text from javadocs that you quoted states that in general an OutputStream object doesn't try to do anything about OS buffering.

Specifically, Windows does internal write buffering. This is from MSDN:

Typically the WriteFile and WriteFileEx functions write data to an internal buffer that the operating system writes to a disk or communication pipe on a regular basis.

So it seems possible that in your case update of file's size is delayed.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top