Cross-platform and cross-process atomic int writes on file

https://stackoverflow.com/questions/2846190

27-09-2019
|

Question

I'm writing an application that will have to be able to handle many concurrent accesses to it, either by threads as by processes. So no mutex'es or locks should be applied to this.

To make the use of locks go down to a minimum, I'm designing for the file to be "append-only", so all data is first appended to disk, and then the address pointing to the info it has updated, is changed to refer to the new one. So I will need to implement a small lock system only to change this one int so it refers to the new address. How is the best way to do it?

I was thinking about maybe putting a flag before the address, that when it's set, the readers will use a spin lock until it's released. But I'm afraid that it isn't at all atomic, is it? e.g.

a reader reads the flag, and it is unset
on the same time, a writer writes the flag and changes the value of the int
the reader may read an inconsistent value!

I'm looking for locking techniques but all I find is either for thread locking techniques, or to lock an entire file, not fields. Is it not possible to do this? How do append-only databases handle this?

edit: I was looking at how append-only db's (couchDB) do it, and it seems they use a thread only to serialize the writes to file. Does that mean it isn't possible to make them embeddable, like sqlite, without locking the entire file with file system locks?

Thanks! Cauê

Solution

Be careful about the append semantics of your filesystem - it probably doesn't provide atomic append operations.

One option is to memory map (mmap) your file as shared, then do atomic memory operations like compare-and-swap on the pointer. Your success will depend on whether your OS has such an operation (Linux, OSX do).

A correct (although I'm not sure it is fast) way accomplish what you want is with rename - it is an atomic file operation on most filesystems. Keep the most up-to-date data in an official file location. To update the data, write your new data to a temporary file, then rename that temporary file to the official location.

OTHER TIPS

When I need to do something like this, typically, I write a process that accepts multiple connections from other processes to get data. This logging process can maintain a single file pointer where it is writing all the data without running the risk of multiple writes going to the same place.

Each thread in the logging process will just listen for new input and submit it to the queue, without blocking the process that generated the data. Trying to do this (writing out to disk) in the threads that generate the data to be logged will eventually put you in a position where you have to have locking operations and suffer whatever performance hit they require.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow