Question

Here is a case, when one must bring in parallelism into the backend server.

I am willing to query N ELB's, each for 5 different queries, and send the result back to the web client.

The backend is Tornado, and according to what I have read many times in the docs, in the past, I should be able to get several tasks processed in parallel if I use @gen.Task or gen.coroutine.

However, I must be missing something in here, as all my requests are (20 in number, 4 elbs * 5 queries) are processed one after another.

def query_elb(fn, region, elb_name, period, callback):
    callback(fn (region, elb_name, period))

class DashboardELBHandler(RequestHandler):

    @tornado.gen.coroutine
    def get_elb_info(self, region, elb_name, period):
        elbReq = yield gen.Task(query_elb, ELBSumRequest, region, elb_name, period)
        elb2XX = yield gen.Task(query_elb, ELBBackend2XX, region, elb_name, period)
        elb3XX = yield gen.Task(query_elb, ELBBackend3XX, region, elb_name, period)
        elb4XX = yield gen.Task(query_elb, ELBBackend4XX, region, elb_name, period)
        elb5XX = yield gen.Task(query_elb, ELBBackend5XX, region, elb_name, period)

        raise tornado.gen.Return( 
            [
                elbReq,
                elb2XX,
                elb3XX,
                elb4XX,
                elb5XX,
            ]
        )

    @tornado.web.authenticated
    @tornado.web.asynchronous
    @tornado.gen.coroutine
    def post(self):
        ret = []

        period = self.get_argument("period", "5m")

        cloud_deployment = db.foo.bar.baz()
        for region, deployment in cloud_deployment.iteritems():

            elb_name = deployment["elb"][0]
            res = yield self.get_elb_info(region, elb_name, period)
            ret.append(res)

        self.push_json(ret)



def ELBQuery(region, elb_name,  range_name, metric, statistic, unit):
    dimensions = { u"LoadBalancerName": [elb_name] }

    (start_stop , period) = calc_range(range_name)

    cw = boto.ec2.cloudwatch.connect_to_region(region)
    data_points = cw.get_metric_statistics( period, start, stop, 
        metric, "AWS/ELB", statistic, dimensions, unit)    

    return data_points

ELBSumRequest   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "RequestCount", "Sum", "Count")
ELBLatency      = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "Latency", "Average", "Seconds")
ELBBackend2XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_2XX", "Sum", "Count")
ELBBackend3XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_3XX", "Sum", "Count")
ELBBackend4XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_4XX", "Sum", "Count")
ELBBackend5XX   = lambda region, elb_name, range_name : ELBQuery(region, elb_name, range_name,  "HTTPCode_Backend_5XX", "Sum", "Count")
Was it helpful?

Solution

The problem is that ELBQuery is a blocking function. If it doesn't yield another coroutine somewhere, there is no way for the coroutine scheduler to interleave the calls. (That's the whole point of coroutines—they're cooperative, not preemptive.)

If the problem is something like the calc_range call, that would probably be easy to deal with—break it up into smaller pieces where each one yields to the next, which gives the scheduler a chance to get in between each piece.

But most likely, it's the boto calls that are blocking, and most of your function's time is spent waiting around for get_metric_statistics to return, while nothing else can run.

So, how do you fix this?

  1. Spin off a thread for each boto task. Tornado makes it pretty easy to transparently wrap a coroutine around a thread or thread-pool task, which magically unblocks everything. But of course there's a cost to using threads too.
  2. Schedule the boto tasks on a thread pool instead of a thread apiece. Similar tradeoffs to #1, especially if you only have a handful of tasks. (But if you could be doing 5 tasks each for 500 different users, you probably want a shared pool.)
  3. Rewrite or monkeypatch boto to use coroutines. This would be the ideal solution… but it's the most work (and the most risk of breaking code you don't understand, and having to maintain it as boto updates, etc.). However, there are people who have at least gotten started on this, like the asyncboto project.
  4. Use greenlets and monkeypatch enough of the library's dependencies to trick it into being async. This sounds hacky, but it may actually be the best solution; see Marrying Boto to Tornado for this.
  5. Use greenlets and monkeypatch the whole stdlib ala gevent to trick boto and tornado to work together without even realizing it. This sounds like a terrible idea; you'd be better off porting your whole app to gevent.
  6. Use a separate process (or even a pool of them) that uses something like gevent.

Without knowing more details, I'd suggest looking at #2 and #4 first, but I can't promise they'll turn out to be the best answer for you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top