Moving web crawler into background: Resque or Sidekiq

https://stackoverflow.com/questions/12732782

05-07-2021
|

Вопрос

I have a Rails App that let the user upload a CSV file with a list(sometimes 200k) of URL to crawl. Then in the controller I will go to each row of this file and then call another method that takes the URL and a few parameters, then when the crawl method is done, save a few variable into a few models. Below is sort of how my controller looks like:

def import
  if request.post? && params[:inputfile].present?
    infile = params[:inputfile].read
    CSV.parse(infile) do |row|
      @crawler = Crawler.new(row[0])
      @crawler.crawl #do the actual crawling using Mechanize Gem and set a few variable in the crawler object
      #when crawl is done save a few stuff into some models
    end
  end
end

I need to move this to the background (so this process don't hold my entire rails app), and be able to run the code for each row asynchronously. I was thinking something like put everything in a queue, and have queue inside that queue for each row... or something like that. I was thinking can I use Resque or Sidekiq for this? if so where should I start?

Решение

Sounds like you did enough digging to end up in the right direction! I'd factor that out into a separate background worker system too.

Sidekiq is better-maintained these days, and the multithreading is very useful for your use case, so I'd pick that. Good starting points are the Sidekiq homepage and this Railscast, both of which give you lots of information to hit the ground running.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow