Pergunta

I have to store many pdf/jpg/png file of max 10mb in a filesystem, and need to save their metadata on a database.

The SFTP and the DB may be on different nodes. On WS, I've a local db where I can check the login and ask the address to the database and the filesystem.

enter image description here

I was wondering: what happens if DB fails just before I've uploaded the file to the SFTP? Or, even worst, the WS fails before inserting data to the DB. Since I've the constraint to return the id of the new insert I can't defer the insert, how is usually deal this kind of system?

Foi útil?

Solução

Here's a partial, but more practical and likely applicable, answer. If follow-on processes only consider files via their metadata entries, then the file upload and the update to the metadata need not be atomic. This means you can't have a process that processes every file in the SFTP directory. It would instead need to fetch the list of files from the metadata table and process each file in the resulting list. Similarly, you can't have a process that checks for a specific file in the directory; it would instead check for an entry in the metadata table.

In this context, you can simply upload the file and, once the upload has been verified, insert a metadata record. If the process fails at any point before the metadata insert commits, you just end up with a harmless orphan file. This assumes that file names are distinct; however, if you always go through the metadata table, the actual file names in the SFTP directory no longer matter, so you can just tack a UUID to the end of them to guarantee they are distinct.

You may want to clean out orphans (particularly as "overwriting" a file in this context simply means orphaning the old file). This can be done in a background process that deletes files older than some time horizon, which can likely be at least 24 hours and quite possibly on the order of months.

If the assumptions don't apply, then something like Christophe's approach becomes necessary.

Outras dicas

You have 3 nodes (WS, SFTB and DB), each performing some part of the operation. The difficulty is that if any of the nodes fails, the others have to rollback the partial work.

You can achieve this for example with a 3 phase commit protocol: - your WS could act as coordinator for the transactions it initiates - the SFTP and the DB would act as participants

Every node has to manage its part of the transaction to be committed or rollback. This usually requires some logging of work in progress, so that the node is able to revert the changes if it has to rollback (and even after a crash, when the system is restarted).

Attention: in this scheme, it is assumed that the SFTP only uploads new files. If you would have several WS and two of them could upload a file with the same filename, the last could overwrite the work of the first so that the metadata get corrupted (two mismatching metadata records for one file) and atomicity is broken.

Licenciado em: CC-BY-SA com atribuição
scroll top