Вопрос

I need to do some async job processing given a web request that I will poll periodically until it is complete. I have the whole stack up and running locally but I can't conceptually understand how to move this over to the EBS worker tier. I'm using Django with Celery and RabbitMQ locally and was successfully able to swap out RabbitMQ with Amazon SQS. However when I tried to create a worker tier that would operate off of the same RDS database as the webapp but was unsuccessful. I'm stuck at the point where I can queue messages but can't read them from the queue. I need to use those messages to perform some expensive operation on the database and prepare the result for the consumer. Is there some architectural piece I'm missing? How and where can I get a celery daemon up to process the SQS messages?

Это было полезно?

Решение

From the Elastic Beanstalk documentation:

When you launch an AWS Elastic Beanstalk environment, you choose an environment tier, platform, and environment type. The environment tier that you choose determines whether AWS Elastic Beanstalk provisions resources to support a web application that handles HTTP(S) requests or a web application that handles background-processing tasks.

AWS Elastic Beanstalk installs a daemon on each Amazon EC2 instance in the Auto Scaling group to process Amazon SQS messages in the worker environment tier. The daemon pulls data off the Amazon SQS queue, inserts it into the message body of an HTTP POST request, and sends it to a user-configurable URL path on the local host. The content type for the message body within an HTTP POST request is application/json by default.

From a developer perspective, the application running on the worker tier is just a plain web service. It will receive calls from AWS Elastic Beanstalk daemon provisioned for you on the instance.

The requests are sent to the HTTP Path value that you configure. This is done in such a way as to appear to the web application in the worker environment tier that the daemon originated the request. In this way, the daemon serves a similar role to a load balancer in a web server environment tier.

The worker environment tier, after processing the messages in the queue, forwards the messages over the local loopback to a web application at a URL that you designate. The queue URL is only accessible from the local host. Because you can only access the queue URL from the same EC2 instance, no authentication is needed to validate the messages that are delivered to the URL.

A web application in a worker environment tier should only listen on the local host. When the web application in the worker environment tier returns a 200 OK response to acknowledge that it has received and successfully processed the request, the daemon sends a DeleteMessage call to the SQS queue so that the message will be deleted from the queue. (SQS automatically deletes messages that have been in a queue for longer than the configured RetentionPeriod.) If the application returns any response other than 200 OK or there is no response within the configured InactivityTimeout period, SQS once again makes the message visible in the queue and available for another attempt at processing.

Другие советы

I'm currently using a "standard" web tier (configured with an higher number of processes and threads) with my Django app + Celery and I have an EC2 instance with a custom AMI running RabbitMQ. SQS is not fully supported by Celery, from http://docs.celeryproject.org/en/latest/getting-started/brokers/sqs.html:

The SQS transport is in need of improvements in many areas and there are several open bugs.

Honestly I never understood what a web worker tier is supposed to be/behave, but my current configuration seems to work pretty good (I also use Celery beat to manage period tasks). I daemonized Celery using Supervisor (which is already used by Elastic Beanstalk to manage apache).

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top