RoR: multiple calls in a row to the same long-time-response controller

Question 1

For any long response time controller function, the delayed job gem is a fine way to go. While it is often used for bulk mailing, it works as well for any long-running task.

Your controller starts the delayed job and responds immediately with a page that has a placeholder - usually a graphic with a progress indicator - and Ajax or a timed reload that updates the page with the full information when it's available. Some information on how to approach this is in this SO article.

Not mentioned in the article is that you can use redis or some other memory cache to store the results rather than the main database.

Question 2

You're probably using webrick in development mode. Webrick only handles one request at a time.

You have several solutions, many ruby web servers exist that can handle concurrency.

Here are a few of them.

Thin

Thin was originally based on mongrel and uses eventmachine for handling multiple concurrent connections.

Unicorn

Unicorn uses a master process that will dispatch requests to web workers, 4 workers equals 4 concurrent possible requests.

Puma

Puma is a relatively new ruby server, its shiny feature is that it handles concurrent requests in threads, make sure your code is threadsafe !

Passenger

Passenger is a ruby server bundled inside nginx or apache, it's great for production and development

Others

These are a few alternatives, many other exist, but I think they are the most used today.

To use all these servers, please check their instructions. They are generally available on their github README.

Question 3

Answers above are part of the solution: you need a server environment that can properly dispatch concurrent requests to separate workers; unicorn or passenger can both work by creating workers in separate processes or threads. This allows many workers to sit around waiting while not blocking other incoming requests.

If you are building a typical bot whose main job is to get content from other sources, these solutions may be ok. But if what you need is a simple controller that can accept hundreds of concurrent requests, all of which are sending independent requests to other servers, you will need to manage threads or processes yourself. Your goal is to have many workers waiting to do a simple job, and one or more masters whose jobs it is to send requests, then be there to receive the responses. Ruby's Thread class is simple, and works well for cases like this with ruby 2.x or 1.9.3.

You would need to provide more detail about what you need to do for help getting to any more specific solution.

Question 4

Try something like unicorn as it handles concurrency via workers. Something else to consider if there's a lot of work to be done per request, is to spin up a delayed_job per request.

The one issue with delayed job is that the response won't be synchronous, meaning it won't return to the user's browser.

However, you could have the delayed job save its responses to a table in the DB. Then you can query that table for all requests and their related responses.

Question 5

What ruby version are you utilizing?

Ruby & Webserver

Ruby

If its a simple application I would recommend the following. Try to utilize rubinius (rbx) or jruby as they are better at concurrency. Although they have drawback as they're not mainline ruby so some extensions won't work. But if its a simple app you should be fine.

Webserver

use Puma or Unicorn if you have the patience to set it up

If you're app is hitting the API service

You indicate that the Global Lock is killing you when you are scraping other sites (presumably ones that allow scraping), if this is the case something like sidekiq or delayed job should be utilized, but with caution. These will be idempotent jobs. i.e. they might be run multiple times. If you start hitting a website multiple times, you will hit a website's Rate limit pretty quickly, eg. twitter limits you to 150 requests per hour. So use background jobs with caution.

If you're the one serving the data

However reading your question it sounds like your controller is the API and the lock is caused by users hitting it.

If this is the case you should utilize dalli + memcached to serve your data. This way you won't be I/O bound by the SQL lookup as memcached is memory based. MEMORY SPEED > I/O SPEED