The right way of doing images manipulation in a serverless app

https://softwareengineering.stackexchange.com/questions/384144

17-02-2021
|

Domanda

This question is not specific to images, but this is the current issue im having so im using this as an example.

So basically, I have some kind of a pipeline that some data(in this case, an image), has to go through before It can be used in my app(in this case, a web app).

If we split the pipeline into functions it would look like this: 1. The image is uploaded to a storage(cloud storage). 2. The image is converted into multiple formats and sizes. 3. All of the different sizes and formats are registered in the database in relation to the original file.

My question is, what would be the right way of managing this process, and keeping track of errors, retries, updating the front-end etc.

I think im trying to solve this task with the wrong mindset, and thus im getting a bad solution. But maybe functions is not the right way. Maybe I need to opt out of cloud functions, and use some other serverless solution, like for example "Google's AppEngine".

Currently, I have a function with an HTTP endpoint that: 1. Receive the file and stream it into the Cloud storage(I do this because I try to avoid a heavy front end). 2. Save the name of the file in the DB with the temporary name, and status(status is currently waiting for optimization/cropping). 3. The upload into Cloud Storage triggers another function that starts the optimization and cropping of the image. 4. Update the DB with the status and all of the subfile names. 5. delete the original file.

The problem is that between 2 and 3, there is a broken link. I can't really know what is going on on the client side without multiple requests to the DB to see the status of the image.

So is this the right way? is there any better way? I found some similar problems all over my app development, like for example with data validation, and that made me think that maybe im getting it wrong.

Soluzione

One thing I've learned early on doing these sorts of complicated procedures is to clearly name and record the steps taken thus far. If something goes wrong (and something will), you want to absolutely be able to trace back to the last effective operation performed. The more insight and details you can provide regarding this, the better. If there are parameters like ids associated, together with the last step performed, you should also include this information so it'll make debugging that much easier.

The next thing you'll want to do is have a blanket exception catch which can log unexpected errors (then rethrow so that such errors aren't hidden). So when you start, every error is an unexpected error. As you find problems and fix them, decide which of these problem is something which is a dealbreaker or if you can continue just the same.

If you can deal with them just the same, catch that specific error, deal with it, and proceed with the next step. If you can't, at least you know the last step performed and all details passed to that step. Ideally you would log errors like this, but it doesn't always depend on you, so the next best thing is be able to recreate the individual steps.

You know all the images which haven't been fully processed because they will not have a finished step (and they won't have processing date within the last hour or so). If you prefer to reduce clutter, if you prefer you can eliminate the step-by-step logging of all operations performed after the process has finished properly for a given image, though I would encourage you to at least leave a feature which allows you to prevent its deletion when asked (it may come in handy in production).

Hope that gave you some insight.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange