Pergunta

Current Scenario

Our application allows users to upload (Amazon S3) and manage their files through our interface. Currently those users can download the files directly from S3/Cloudfront through our application.

Adding a Zip feature

We would like to add a feature to the system that allows users to select multiple files at once and download them as a zip file.

We would like the process to work in real time, so that when the user clicks the button to download a zip file, the system processes the request, compiles the zip, and returns the zip to the end user. While the process is running, the front-end will tell the user their request is processing, and when it is complete, it will prompt them to download the file.

Proposed Strategy

Our application is AWS cloud based, so my first instinct is to use Amazon SQS (but am open to other suggestions) to build a queue system to handle these requests. The flow would work like this:

  1. User selects files and requests a zip file to be created.
  2. Front-end sends a request to SQS with the files to be zipped, and the destination (on S3) where the final zip file will be stored.
  3. The queue worker uses long polling to check for new SQS jobs. When a job is found, the worker downloads the files from S3, zips them up, and returns the finished zip file to the specified location on S3.
  4. Now that the queue has finished processing, alert the client that their job is done and their file is ready for download.

Best Way to Handle Step 4?

This process appears to be a solid approach, and everything makes sense up until step 4. We are struggling with the best way to notify the end user in real-time that their job has finished and their file is ready for download.

From my research I have seen three ways that people suggest:

  1. Use pub/sub to create a socket for the job, and return the response when complete.

    • From everything I have read, it is not best practice (and sometimes not possible) to create a large number of open pub/sub sockets/channels. In addition, how does SQS notify the open socket that the job is complete? This article regarding Amazon SNS looked promising at first, but they seem to infer that it is best practice to use a single SNS topic for the entire queue, not an SNS topic per job. This seems like a great approach if we were simply wanting to send an email to the user, letting them know their job has finished. However, I want to be able to update the user interface inside the web browser.
  2. Continuous polling by the front-end server for the finished zip file on S3.

    • If there are thousands of simultaneous jobs, this would create thousands of requests. This approach doesn't seem very scaleable, and could potentially cost money by extra S3 api calls.
  3. Leave an open long-running request

    • Due to the fact that if there are many jobs in the queue, the time taken to deliver the response could cause the request to time out. In addition this creates a lot of strain/wasted resources on the front-end web server. Again, not very scaleable.

My Question

Taking into account the points that I mentioned above, what is the best way to notify the front-end client that their job queue has finished and their file is ready to download?

A great example of what I am trying to do is YouTube. When you upload a video to YouTube, their queue processes your video, and when it is complete, it notifies you that the video processing has finished. I am trying to replicate the same concept, except that in my scenario, I just want to the client to know when the zip file has been prepared and uploaded to S3.

I'm aware of theoretical solutions, so please be specific. I am open to any suggestions, regardless if they use SQS or any of the technologies/methodologies I have listed. Pointing to any real-world code examples of this sort of thing would be a big bonus.

Thank you in advance for your great help!

Foi útil?

Solução

My thinking is your front-end establishes a server sent event (https://developer.mozilla.org/en-US/docs/Server-sent_events/Using_server-sent_events) with a web server which issues the do job message via SQS to the worker. When the worker completes it adds a message back to the web server on a separate queue which is being polled for batches of messages at a time, for each one sending the event back down to the client.

You do not need to use web sockets as your client and server will not be chatting back and forth.

Licenciado em: CC-BY-SA com atribuição
scroll top