Why does using fsync() to flush writes to disk speed up access?

Question 1

The issue is in the way you're attempting to time an I/O write. You semantically want to measure the wall-clock time between I/O record writes, but you are using the C library function clock, which measures CPU execution time and not total time elapsed. Use clock_gettime with a clock selection of CLOCK_MONOTONIC or, ideally, CLOCK_MONOTONIC_RAW (the latter being a Linux extension).

You are not collecting the total time elapsed between calls to clock: you are collecting an estimate of the amount of time your process was spinning CPU cycles. Your disk I/O (specifically, both of the calls to write and fsync) is blocking, which means each of those system calls is handled by the kernel on your behalf and does not consume CPU within your process context. Hence, you need to measure the actual difference in wall-clock time, which as it sounds, is the total time elapsed in the real world, outside the scope of just your test program's process. Indeed, it is not CPU time you are concerned about at all with fsync. Most of the I/O operations' execution time will not be handled by the kernel or even the CPU; it will be due to the disk controller.

Additionally, small record sizes are OK as a benchmark. It is a common use case for synchronized I/O (e.g., writing metadata for a transaction log). To get the timing stability of larger record sizes, simply increase the number of loop iterations significantly per timer interval and average/amortize. This will accurately model the cost of small blocking records being written and flushed synchronously.

Do consider fdatasync for improved performance.

Question 2

Much appreciate your comments, thanks! The comments suggesting increasing the test to a larger number of transactions are correct. When using larger numbers of transactions fsync() does appear to do something. At least on OS/X 10.8:

When the write does not increase the file size, fsync() doubles the time it takes to complete the write.
When the write does increase the file size, fsync() is signficantly slower.