What's the best way to schedule and execute repetitive tasks (like scraping a page for information) in Rails?

Question 1

First off: your rufus scheduler code is in an initializer, which is fine, but it is executed before the rails process is started, and only when the rails process is started. So, in the initializer you have no access to any variable @interval you could set, for instance in a controller.

What are possible options, instead of a class variable:

read it from a config file
read it from a database (but you will have to setup your own connection, in the initializer activerecord is not started imho

And ... if you change the value you will have to restart your rails process for it to have effect again.

So an alternative approach, where your rails process handles the interval of the scheduled job, is to use a recurring background job. At the end of the background, it reschedules itself, with the at that moment active interval. The interval is fetched from the database, I would propose. Any background job handler could do this. Check ruby toolbox, I vote for resque or delayed_job.

Question 2

I'm not very familiar with Rufus Scheduler but it appears that it will be difficult to acheive both of your goals (regular heartbeat, dynamically rescheduled) with it. In order for it to work, you'll have to capture the job_id that it returns, use that job_id to stop the job if a rescheduling event occurs, and then create the new job. Rufus also points out that it's an in-memory application whose jobs will disappear when the process disappears -- reboot the server, restart the application, etc and you've got to reschedule from scratch.

I'd consider two things. First, I'd consider creating a model that wraps the screen-scraping that you want to do. At a minimum you'd capture the url and the interval. The model may wrap up the code for processing the html response (basically what's wrapped up in the 2.times block) as instance methods that you trigger based on the URL. You may also capture this in a text column and use eval on it, assuming that only "good guys" get access to this part of the system. This has a couple of advantages: you can quickly expand to scraping other sites and you can sanitize the interval sent back by the user.

Second, something like Delayed::Job may better suit your needs. Delayed::Job allows you to specify a time for the job's execution which you could fill in by reading the model and converting the interval to a time. The key to this approach is that the job must schedule the next iteration of itself before it exits.

This won't be as rock-steady as something like cron but it does seem to better address the rescheduling need.

What's the best way to schedule and execute repetitive tasks (like scraping a page for information) in Rails?

I'm looking for a solution which enables:

Options I know about

Current situation

Is it not working?