Question

We need an app to as much as possible, guarantee that when it reports a record persisted, it really was. I understand that to do this you use fsync(fd). However, for some strange reason, it appears using fsync() speeds up the code that writes to disk, instead of slowing it down as one would expect.

Some sample test code returns the following results:

no sync() seconds:0.013388   writes per second:0.000001 
   sync() seconds:0.006268   writes per second:0.000002

Below is the code that produces these results:

#include <stdio.h>
#include <fcntl.h>
#include <time.h>
#include <unistd.h>

void withSync() {
    int f = open( "/tmp/t8" , O_RDWR | O_CREAT );
    lseek (f, 0, SEEK_SET );
    int records = 10*1000;
    clock_t ustart = clock();
    for(int i = 0; i < records; i++) {
        write(f, "012345678901234567890123456789" , 30);
        fsync(f);
    }
    clock_t uend = clock();
    close (f);
    printf("   sync() seconds:%lf   writes per second:%lf\n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}

void withoutSync() {
    int f = open( "/tmp/t10" , O_RDWR | O_CREAT );
    lseek (f, 0, SEEK_SET );
    int records = 10*1000;
    clock_t ustart = clock();
    for(int i = 0; i < records; i++) {
        write(f, "012345678901234567890123456789" , 30 );
    }
    clock_t uend = clock();
    close (f);
    printf("no sync() seconds:%lf   writes per second:%lf \n", ((double)(uend-ustart))/(CLOCKS_PER_SEC), ((double)records)/((double)(uend-ustart))/(CLOCKS_PER_SEC));
}

int main(int argc, const char * argv[])
{
    withoutSync();
    withSync();
    return 0;
}
Was it helpful?

Solution

The issue is in the way you're attempting to time an I/O write. You semantically want to measure the wall-clock time between I/O record writes, but you are using the C library function clock, which measures CPU execution time and not total time elapsed. Use clock_gettime with a clock selection of CLOCK_MONOTONIC or, ideally, CLOCK_MONOTONIC_RAW (the latter being a Linux extension).

You are not collecting the total time elapsed between calls to clock: you are collecting an estimate of the amount of time your process was spinning CPU cycles. Your disk I/O (specifically, both of the calls to write and fsync) is blocking, which means each of those system calls is handled by the kernel on your behalf and does not consume CPU within your process context. Hence, you need to measure the actual difference in wall-clock time, which as it sounds, is the total time elapsed in the real world, outside the scope of just your test program's process. Indeed, it is not CPU time you are concerned about at all with fsync. Most of the I/O operations' execution time will not be handled by the kernel or even the CPU; it will be due to the disk controller.

Additionally, small record sizes are OK as a benchmark. It is a common use case for synchronized I/O (e.g., writing metadata for a transaction log). To get the timing stability of larger record sizes, simply increase the number of loop iterations significantly per timer interval and average/amortize. This will accurately model the cost of small blocking records being written and flushed synchronously.

Do consider fdatasync for improved performance.

OTHER TIPS

Much appreciate your comments, thanks! The comments suggesting increasing the test to a larger number of transactions are correct. When using larger numbers of transactions fsync() does appear to do something. At least on OS/X 10.8:

  1. When the write does not increase the file size, fsync() doubles the time it takes to complete the write.
  2. When the write does increase the file size, fsync() is signficantly slower.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top