I'm building an inverted index algorithm. I'll be storing the index in the form of a hashtable in a file or database. Updating the index involves 1. Loading it into memory, from the file or database 2. Converting it into an object 3. Making changes to the object 4. Converting the object into a string 5. Saving it in the file or database

Now, there are multiple sources that feed data into the index. If 2 or more sources try to feed in data simultaneously, there's a pretty good chance of running into race conditions.

What are the best known techniques to avoid this?

有帮助吗?

解决方案

Normally you wouldn't load the whole index into memory and then write the whole index back to disk later.

One option is to have the index primarily on disk, and just modify the file on the disk directly. To avoid race conditions, you use some form of a lock. You can globally lock the file for writing if updates are rare. If you want to have more fine grained locking then you need to make more decisions about the structure of the index on disk.

Another option is to have the index in memory, and just maintain a "copy" on disk for recovery purposes. In that case, everyone updating the index would manipulate eventually the same, shared in-memory index, and access to that would be protected by global or fine-grained operating system level locks.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top