how to ensure data integrity using the function storeFile() of the php driver of mongodb?

https://stackoverflow.com/questions/15536095

24-03-2022
|

Question

I ask my question here because I didn't found any information anywhere else yet.

The context:
- replicaSet architecture with 3 servers.
- PHP as main language
- the picture average size is around 2Mo
- usage : storing a lot of galleries into mongodb using the storeFile() PHP driver's function

I'm anxious about the storage during and after the primary server crashed.
If I store one entire file of 80Mo using storeFile(), the client directs the query to the primary of the replicaSet, which starts then storing data.
If, now, the writing operation is 60% finished, and the primary fails, what will happen to the write operation ?
I know I'll get an exception, which I catch and attempt to retry the operation. But what about the data that were already written ?

I'll have an existing file in the database, which is a part of the original picture and thus corrupted ?
Or, it works like transaction, and so if an error occurs, mongodb rolls back the operation and throws an exception ?

Link to my thread on google group :google group thread

Solution

The PHP MongoDB driver will attempt to "cleanup" failed writes to GridFS, however if the primary crashed then we cannot access any primary to execute the cleanup routines on.

The data will therefore still be there in the chunks collection, but the metadata (and the actual file information) will not, as that is not written until the entire GridFS write has successfully completed.

In general this is not a problem as you shouldn't be experiencing a lot of failovers. If however this is an actual concern that you are storing few megabytes of data that you can't really access, then you would need to create some sort of background/cronjob task that would iterate over the chunks collection and detect orphan chunks. Be careful though, as you don't want to be deleting chunk that are currently being created :)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow