Question

I'm writing a program that runs on both Linux and FreeBSD, and I want to make sure that the data is actually written to the file on the physical device when each write() returns, so that my data won't get lost by accident (eg, power lost, the process is interrupted unexpected, etc.).

According to OPEN(2) man page, on Linux (higher than 2.6), O_DIRECT is synchronous but may have performance problems; on FreeBSD, O_DIRECT is not guaranteed synchronous and may also has problems.

So, on Linux, either O_DIRECT or O_SYNC guarantees synchronous write, but which one has better performance?

On FreeBSD, to guarantee synchronous write, which option has the best performance: (1) O_DIRECT + fsync() (2) O_DIRECT | O_SYNC or (3)O_SYNC alone?

Was it helpful?

Solution

With current harddisks, there is no assurance that a file is actually written to disk even if the disk reports the write as complete to the OS! This is due to built-in cache in the drive.

On freeBSD you can disable this by setting the kern.cam.ada.write_cache sysctl to 0. This will degrade write performance significantly. Last time I measured it (WDC WD5001ABYS-01YNA0 harddisk on an ICH-7 chipset, FreeBSD 8.1 AMD64), continuous write performance (measured with dd if=/dev/zero of=/tmp/foo bs=10M count=1000) dropped from 75,000,000 bytes/sec to 12,900,000 bytes/sec.

If you want to be absolutely sure that your files are written;

  • Disable write caching with sysctl kern.cam.ada.write_cache=0 followed by camcontrol reset <bus>:<target>:<lun>.
  • Open the file with the O_SYNC option.

Note:

  • Your write perfomance (on a HDD) will now absolutely suck.
  • Do not mount the partition with the sync option; that will cause all I/O (including reads) to be done syncronously.
  • Do not use O_DIRECT. It will try to bypass the cache altogether. That will probably also influence reads.

OTHER TIPS

O_DIRECT basically exists solely for Oracle to bypass the kernel's buffer cache layer and do its own caching. It has ill-defined semantics, arbitrary limitations on the size and alignment of reads you can perform, and generally should not be used. O_SYNC is supposed to give you the effects you want, but without an underlying filesystem that's robust against power failure or crashes, it still might not be sufficient for your needs.

So, on Linux, either O_DIRECT or O_SYNC guarantees synchronous write, but which one has better performance?

This statement is not correct because as mentioned by @roland-smith on at least Linux, O_DIRECT does not guarantee that the data has reached non-volatile media. It might happen to give that guarantee in a specific environment (e.g. writing directly to the block device representing a disk with a battery backed SCSI controller) but you can't rely on this in the general case (e.g. writing to a file in an ext4 filesystem backed only by a single SATA hard disk) because of at least the following:

  • O_DIRECT on a file in a filesystem doesn't guarantee that the metadata necessary to retrieve the data back after a power loss crash has been written
  • The kernel sent the I/O to hardware before the original call finished but the I/O is only in a volatile hardware cache

In the above scenarios sudden power loss would mean your program had thought I/O was successful when it had not. These days the Linux open(2) man page says this:

The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary metadata are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT. See NOTES below for further discussion.

In the scenario given, on Linux the only way you to guarantee every write in synchronous write is to use O_SYNC (which incur a speed hit) or do an fsync() after every I/O (which is likely slower because you did two syscalls). If I was worried about speed I would forgo using O_SYNC and would instead write in batches as big as possible and then fsync() after a batch. Also be aware that if you're worried about data integrity you have check the return code of all fsync() and all write() calls (and close() etc) for errors.

See this answer on "What does O_DIRECT really mean?" for further details and links.

On FreeBSD, to guarantee synchronous write, which option has the best performance: (1) O_DIRECT + fsync() (2) O_DIRECT | O_SYNC or (3) O_SYNC alone?

You're in a similar situation to Linux (so see above) but of the three choices I would think the third one (O_SYNC alone) would be fastest. The FreeBSD open(2) man page says this about O_DIRECT:

O_DIRECT may be used to minimize or eliminate the cache effects of reading and writing. The system will attempt to avoid caching the data you read or write. If it cannot avoid caching the data, it will minimize the impact the data has on the cache. Use of this flag can drastically reduce performance if not used with care.

General note: using O_DIRECT doesn't automatically mean all I/O will go faster - this depends on the workload (I/O size, I/O frequency, whether I/O is sequential or random, how frequently syncing is happening because it can impact merging etc) and how the I/O is submitted (synchronously vs asynchronously).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top