Question

I'm developing an application that will run in a elastic environment on AWS (Ec2 instances with autoscaling). All the app is being developed in PHP.

The core of the app is based on safely storing files in a S3 bucket. As the user doesn't needs to know where was it saved, I thought that I could make this store the file temporarily in the EC2 instance and then asynchronously move it to S3, using a job queue (Amazon SQS) to avoid duplicating the wait time and having better support for s3 problems (they aren't common, but can happen).

My questions are:

  1. Does this approach sounds good or I'm missing something?
  2. When processing the job from the queue, the worker instance will have to connect to the original s3 instance, retrieve the file from it and then upload it to s3?
  3. How can avoid having problems when the autoscaling? An instance could be deleted before I store the file in the S3 bucket.
Was it helpful?

Solution

Ideally, you don't want your main app server being tied during file uploads (both to the app server and subsequently to S3).

CORS (Cross Origin Resource Sharing) exists to avoid precisely this. You can upload the file to S3 directly from the client-side and let amazon worry about handling multiple uploads from your concurrent users. It lets your app do what it does best without having to worry about the uploads themselves.

This SO question discusses the same issue and there are several customisable plugins like fine uploader out there which can wrap around this with progress bars, etc.

This completely removes the need to make use of any kind of queue. If you need to do certain bookkeeping operations after the upload, you could simply make an ajax call to your server after the upload is complete with the file info, etc. It should also address any concerns you might have with instances being removed due to autoscaling since everything is client side.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top