Unable to add a method to add_date_job() to store jobs in a SQLAlchemyJobStore job store

https://stackoverflow.com/questions/14822544

09-03-2022
|

Domanda

Problem statement:

I am trying to add a method (Test::start()) to scheduler.add_date_job() which is configured to store the jobs in a SQLAlchemyJobStore job store. The addition of the job to the job store is successful. But when I try to start the scheduler, obj_to_ref (in apscheduler/util.py) is not able to get the ref_to_obj() for the given object [in this case, the given object is Test::start() - In other words, <bound method Test.start of <__main__.Test instance at 0xa119a6c>>].

But the same operation is working fine in the following cases:

Other job stores (f.e. RAMJobStore - this is the default when no job store is added/configured).
When scheduler.add_date_job() is called with other function (f.e. func in the below code) and not methods like Test::start() (the job store being SQLAlchemyJobStore) [the ref_to_obj() and obj_to_ref() for func is <function func at 0xb768ed14>]. I have added some debugs (in apscheduler/util.py) to confirm the same.

The code is as follows:

from apscheduler.scheduler import Scheduler as scheduler
from datetime import datetime, date, time, timedelta
import time
import logging

logging.basicConfig(filename='/tmp/log', level=logging.DEBUG,
        format='[%(asctime)s]: %(levelname)s : %(message)s')

class Failed(Exception):
    def __str__(self):
        return 'Failed!!'

# APScheduler Configure Options
_g_aps_default_config = {
    'apscheduler.standalone' : True,
    'apscheduler.jobstore.default.class' : 'apscheduler.jobstores.sqlalchemy_store:SQLAlchemyJobStore',
    'apscheduler.jobstore.default.url' : 'mysql://root:root123@localhost/jobstore',
    'apscheduler.jobstore.default.tablename' : 'mytable'
}

class Test:
    def __init__(self, *args, **kwargs):
        self.scheduler = scheduler(_g_aps_default_config)
        self.__running = False
        # Intentionally don't want to start!!
        self.__dont_start = True 
        self.__retry_count = 0
        self.__start_max_retries = 5

    def start(self):
        try:
            # Try to start here! 
            # Intentionally don't want to start for the first 5 times
            if self.__retry_count < self.__start_max_retries:
                self.__retry_count += 1
                raise Failed
            if self.__running:
                raise Failed
            self.__running = True
            print 'started successfully :)'
        except Failed:
            # log the start failure and reschedule the start()
            print 'attempt (#%d): unable to start now.. ' \
                  'so rescheduling to start after 5 seconds' % self.__retry_count
            alarm_time = datetime.now() + timedelta(seconds=5)
            self.scheduler.add_date_job(self.start, alarm_time)
            self.scheduler.start()

def func():
    print 'this is a func and not a method!!!'

if __name__ == '__main__':
    t = Test()
    t.start()
    while True:
        time.sleep(10)
    t.stop()

The stack trace is as follows:

Traceback (most recent call last):
  File "user1.py", line 55, in <module>
    t.start()
  File "user1.py", line 48, in start
    self.scheduler.start()
  File "/usr/lib/python2.7/site-packages/APScheduler-2.1.0-py2.7.egg/apscheduler/scheduler.py", line 109, in start
    self._real_add_job(job, jobstore, False)
  File "/usr/lib/python2.7/site-packages/APScheduler-2.1.0-py2.7.egg/apscheduler/scheduler.py", line 259, in _real_add_job
    store.add_job(job)
  File "/usr/lib/python2.7/site-packages/APScheduler-2.1.0-py2.7.egg/apscheduler/jobstores/sqlalchemy_store.py", line 58, in add_job
    job_dict = job.__getstate__()
  File "/usr/lib/python2.7/site-packages/APScheduler-2.1.0-py2.7.egg/apscheduler/job.py", line 120, in __getstate__
    state['func_ref'] = obj_to_ref(self.func)
  File "/usr/lib/python2.7/site-packages/APScheduler-2.1.0-py2.7.egg/apscheduler/util.py", line 174, in obj_to_ref
    raise ValueError('Cannot determine the reference to %s' % repr(obj))
ValueError: Cannot determine the reference to <bound method Test.start of <__main__.Test instance at 0xa119a6c>>

The debugs that I have added in apscheduler/util.py is as follows:

161 def obj_to_ref(obj):
162     """
163     Returns the path to the given object.
164     """
165     ref = '%s:%s' % (obj.__module__, get_callable_name(obj))
166     print 'obj_to_ref : obj : %s' % obj
167     print 'obj_to_ref : ref : %s' % ref
168     try:
169         obj2 = ref_to_obj(ref)
170         print 'obj_to_ref : obj2 : %s' % obj2
171         if obj != obj2:
172             raise ValueError
173     except Exception:
174         raise ValueError('Cannot determine the reference to %s' % repr(obj))
175 
176     return ref

The following are the debug prints for Test::start() :

obj_to_ref : obj : <bound method Test.start of <__main__.Test instance at 0xa119a6c>>
obj_to_ref : ref : __main__:Test.start
obj_to_ref : obj2 : <unbound method Test.start>

Changing the scheduler.add_date_job() to a function (f.e. func) instead of a method (f.e. Test::start())

self.scheduler.add_date_job(func, alarm_time)

The following are the debug prints for func() :

obj_to_ref : obj : <function func at 0xb768ed14>
obj_to_ref : ref : __main__:func
obj_to_ref : obj2 : <function func at 0xb768ed14>

Am I doing something wrong here? or is this a bug in apscheduler/util.py functions w.r.t. SQLAlchemyJobStore?

Any known work-around?!

Soluzione

The main problem is when you are using SQLAlchemyJobStore or any other job store other then RAMJobStore, the apscheduler serializes your job using pickle to save it into storage. It saves only reference name to the function you specified in scheduler.add_date_job method.

So in your case it saves something like <object id in memory>.start.

So for a job function you should use functions defined at the top level of a module and not an instance method.

Also it means apscheduler doesn't save job function state between runs. You may need to implement saving and loading state into database inside of the method. But this would make things too complicate.

Better way to go would be to implement custom schedule trigger class that will decide when to run job. You may need still to load/save state for the trigger - so it will support stopping and starting scheduler process.

Some links:

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow