Architecture: Handling large scale photo upload and resizing

https://stackoverflow.com/questions/23223246

07-07-2023
|

Question

I have a system where users can upload full resolution sized images of about 16 mega-pixels which result in large files.

The current methodology is:

Receive the upload in a HTTP request.
Within the request, write the original file to blob store
Still within the request, make about 10 copies of the file at various resolutions. (These are thumbnails at different sizes, some for Hi-DPI (retina) devices, as well as a dimension for full-sized viewing. I also convert the images to WebP.
I then transfer all the results to blob stores in different regions for private CDN purposes.

Clearly, the issue is that since this is all done within a HTTP request, it consumes vastly more server resources than any other typical HTTP request, especially when users start uploading images in bulk, several users at a time. If a user uploads a large image, the memory consumption jumps dramatically (I am using ImageMagick.NET for image processing).

Is this architecture more suitable:

Receive the file upload, write to the blob, add a notification to a processing queue, return success to the user.
A separate worker server receives the notification of the new file and starts all the re-sizing, processing and replication.
I just set the client-side JavaScript to not load the image previews for a few seconds, or get it retry if the image is not found (meaning that the image is still being processed, but is likely to show up sometime soon).

At least this new method will scale easier, has more predictable performance. But it seems like a lot of work just to handle something as 'every day' as photo uploading. Is there a better way?

I know the new method follows the same principle as using an external re-sizing service where, but wan't to do this in house since I am concerned about privacy of some of these third-party services. It would still mean I would have to adapt the client to deal with missing/unprocessed images.

Solution

Yes, what you're describing is a better way. It sounds more complicated, but it is how majority of scalable sites handle big load.. offload it to a queue and let workers process it.

I'd add a correction in your case for step #2:

A separate worker server monitors a queue and and starts all the re-sizing, processing and replication when a message appears instructing it to do so.

OTHER TIPS

Another option would be to use the new Web Jobs feature. In fact your scenario seems to be so common (in terms of image processing) that it's listed as one of the Typical Scenario on MSDN.

Image processing or other CPU-intensive work. A common feature of web sites is the ability to upload images or videos. Often you want to manipulate the content after it's uploaded, but you don't want to make the user wait while you do that.

Whether its better or not I'll leave it up to you to decide.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow