Python Twisted Deferred : clarification needed

https://stackoverflow.com/questions/8866517

16-04-2021
|

Question

I am hoping for some clarification on the best way to deal with handling "first" deferreds , ie not just adding callbacks and errbacks to existing Twisted methods that return a deferred, but the best way of creating those original deferreds.

As a concrete example, here are 2 variations of the same method : it just counts the number of lines in some rather big text files, and is used as the starting point for a chain of deferreds.

Method 1: This one does not feel so good, as the deferred is fired directly by the reactor.callLater method.

def get_line_count(self):
    deferred = defer.Deferred()

    def count_lines(result):
        try:
            print_file = file(self.print_file_path, "r")
            self.line_count = sum(1 for line in print_file)
            print_file.close()
            return self.line_count
        except Exception as inst:
            raise InvalidFile()

     deferred.addCallback(count_lines)
     reactor.callLater(1, deferred.callback, None)
     return deferred

Method 2: slightly better , as the deferred is actually fired when the result is available

def get_line_count(self):
    deferred = defer.Deferred()

    def count_lines():
        try:
            print_file = file(self.print_file_path, "r")
            self.line_count = sum(1 for line in print_file)
            print_file.close()
            deferred.callback(self.line_count)
        except Exception as inst:
            deferred.errback(InvalidFile())

    reactor.callLater(1, count_lines)
    return deferred

Note: You could also point out that both of these are actually synchronous, and potentially blocking methods, (and I perhaps could use "MaybeDeferred"?). But well, that that is actually one of the aspects I get confused by.

For Method 2, if the count_lines method is very slow (counting the lines in some huge files etc), will it potentially "block" the whole Twisted app ? I read quite a lot of documentation on how callbacks and errbacks and the reactor behave together (callbacks need to be executed quickly, or return deferreds themselves etc), but in this case , I just don't see and would really appreciate some pointers/examples etc
Are there some articles/clear explanations that deal with the best approach to creating these "first" deferreds? I have read through these excellent articles , and they have helped a lot with some of the basic understanding, but I still feel like I am missing a piece.
For blocking code, would this be this a typicall case for DeferToThread or reactor.spawnprocess ? I read through a lot of questions like this one and this article, but I still am not 100% sure on how to deal with potentially blocking code, mostly when dealing with file i/o

Sorry if any of this seems too basic , but I really want to get the hang of using Twisted more thoroughly. (It has been a really powerful tool for all the more network-oriented aspects). Thank you for your time!

Solution

Yes, you've got it right: you need threads or separate processes to avoid blocking the Twisted event loop. Using Deferreds wont magically make your code non-blocking. For your questions:

Yes, you would block the event loop if count_lines is very slow. Deferring it to a thread would solve this.
I used Twisteds documentation to learn how Deferreds work, but I guess you've already been through that. The article on database support was information since it clearly says that this library is built using threads. This is how you bridge the synchronous–asynchronous gap.
If the call is truly blocking, then you need to DeferToThread. Python itself is kind-of single threaded, meaning that only one thread can execute Python byte code at a time. However, if the thread you create will block on I/O anyway, then this model works fine: the thread will release the global interpreter lock and so let other Python threads run, including the main thread with the Twisted event loop.

It can also be the case that you can use non-blocking I/O in your code. This can be done with the select module, for example. In that case, you don't need a separate thread. Twisted uses this technique internally and you don't have to think of this if you do normal network I/O. But if you're doing something exotic, then it's good to know how things are built so that you can do the same.

I hope that makes things a bit clearer!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow