Question

I have a website I am looking to stay updated with and scrape some content from there every day. I know the site is updated manually at a certain time, and I've set cron schedules to reflect this, but since it is updated manually it could be 10 or even 20 minutes later.

Right now I have a hack-ish cron update every 5 minutes, but I'd like to use the deferred library to do things in a more precise manner. I'm trying to chain deferred tasks so I can check if there was an update and defer that same update a for couple minutes if there was none, and defer again if need be until there is finally an update.

I have some code I thought would work, but it only ever defers once, when instead I need to continue deferring until there is an update:

(I am using Python)

class Ripper(object):
    def rip(self):
        if siteHasNotBeenUpdated:
            deferred.defer(self.rip, _countdown=120)
        else:
            updateMySite()

This was just a simplified excerpt obviously.
I thought this was simple enough to work, but maybe I've just got it all wrong?

Was it helpful?

Solution

The example you give should work just fine. You need to add logging to determine if deferred.defer is being called when you think it is. More information would help, too: How is siteHasNotBeenUpdated set?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top