Question

I am currently running into an issue with a project that uses Celery. When routing tasks to a queue and when the number of tasks exceeds the concurrency the worker goes offline. In this state it is still processing the tasks but the worker appears offline and cannot be inspected. Pinging the worker or doing an inspect active_queues gives the message 'Error: No nodes replied within the time constraint'. The worker will come back online after the number of tasks is equal to or less than that of the concurrency for the worker.

Running the worker with '-l debug' did not show any errors as the worker went offline. The logs continued while the worker was seemingly offline, even though it couldn't be inspected.

Celery is on version 3.1.11 and the broker is RabbitMQ 3.3.0.

My Celery configuration:

from celery.schedules import crontab
from kombu import Exchange, Queue

CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = (
    Queue('manager_tasks',
          exchange=Exchange('manager', type='direct'),
          routing_key='manager.#'),
    Queue('high_priority_tasks',
          exchange=Exchange('high', type='direct'),
          routing_key='high.#'),
    Queue('low_priority_tasks',
          exchange=Exchange('low', type='direct'),
          routing_key='low.#'),
)

CELERY_DEFAULT_EXCHANGE = 'tasks'
CELERY_DEFAULT_EXCHANGE_TYPE = 'topic'
CELERY_DEFAULT_ROUTING_KEY = 'task.default'

BROKER_URL = 'amqp://[user]:[password]@[url]:[port]/[vhost]'

CELERY_RESULT_BACKEND = 'djcelery.backends.database:DatabaseBackend'

CELERYBEAT_SCHEDULE = {
    'scan-hosts': {
        'task': 'myapp.task',
        'schedule': crontab(minute='*/1',
                            hour='8-17',
                            day_of_week='mon-fri'),
        'options': {
            'queue': 'manager_tasks',
            'routing_key': 'manager.#'
        },
    },
}

CELERY_TASK_RESULT_EXPIRES = 18000  # 5 hours

CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
Was it helpful?

Solution

From what I found on this, the worker appears to go offline as a way to signal that it is full and cannot prefetch any more tasks. It comes back up once it is able to receive more tasks and goes back to an 'offline' state once the concurrency and prefetch limits are met.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top