Question

I have setup celery + rabbitmq for on a 3 cluster machine. I have also created a task which generates a regular expression based on data from the file and uses the information to parse text. However, I would like that the process of reading the file is done only once per worker spawn and not on every execution of as task.

from celery import Celery

celery = Celery('tasks', broker='amqp://localhost//')
import re

@celery.task
def add(x, y):
     return x + y


def get_regular_expression():
    with open("text") as fp:
        data = fp.readlines()
    str_re = "|".join([x.split()[2] for x in data ])
    return str_re    



@celery.task
def analyse_json(tw):
    str_re = get_regular_expression()
    re.match(str_re,tw.text) 

In the above code, I would like to open the file and read the output into the string only once per worker, and then the task analyse_json should just use the string.

Any help will be appreciated,

thanks, Amit

Was it helpful?

Solution

Put the call to get_regular_expression at the module level:

str_re = get_regular_expression()

@celery.task
def analyse_json(tw):
    re.match(str_re, tw.text)

It will only be called once, when the module is first imported.

Additionally, if you must have only one instance of your worker running at a time (for example CUDA), you have to use the -P solo option:

celery worker --pool solo

Works with celery 4.4.2.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top