Question

I want to build a service that notifies me when a url returns status 200. I'm currently using a sidekiq worker, if the status == 200, it updates my database (row.available = true), if not, it raises an exception and retries the worker in n seconds, n amount of times.

Though this works, it doesn't feel efficient or scalable (1000's checks would result in 1000's of exceptions, and on certain platforms that's bad news -- JRuby), and I'm sure there is a way I can build an internal service to manage this url monitoring that doesn't rely on sidekiq (perhaps in Go, or another, more suited Ruby gem). However, I have no idea where to begin, and so I'd appreciate some general direction.

Was it helpful?

Solution

Writing and running a simple link checker is easy. Doing that for 1000s of links quickly, without redundancy, and handling dead and slow-responding links without bogging down your entire system gets harder.

I'd use three threads, plus two queues:

  1. A dispatcher thread that only reads from the database. It is responsible for finding and queuing URLs to be checked in to a "to be checked" queue.
  2. A worker thread that consumes from the first queue and pushes results into the "updated URL results" queue.
  3. An updater/consumer thread that takes the result of a thread in #2 and updates the database.

Ruby has some built-in classes to help:

I'd highly recommend Typhoeus and Hydra for use in the middle thread. The documentation for these two classes cover a lot of what you need to do as far as handling multiple threads running in parallel.

I wouldn't write this code as part of a Rails application. There is no value added by Rails to this, nor is it necessary. I would either require Active Record and piggy-back on the existing database.yaml settings and models, or use Rails' "runner" to run the code as an adjunct to the Rails code.

Or, I'd write a small, application-specific, piece of code to run on a different server to avoid bogging down the Rails server. Using something like MySQL or PostgreSQL drivers would let you talk to the same database that Rails uses. In this case I'd use the Sequel gem to act as the ORM, but that's because I prefer it over Active Record.

There are a lot of things to consider as you write this code, including retries of failed URLs, sensing redirections and updating the source URLs to reflect them to avoid wasting time, and not beating up the hosting servers causing you to be banned.

I've written several apps for this purpose over the years and doing it right takes a lot of forethought, so think out your design up front otherwise you could end up with some major rewrites later on.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top