Question

I'm using python to interface a hardware usb sniffer device with the python API provided by the vendor and I'm trying to read (usb packets) from the device in a separate thread in an infinite loop (which works fine). The problem is that my main loop does not seem to ever get scheduled again (my read loop gets all the attention).

The code looks much like this:

from threading import Thread
import time
usb_device = 0

def usb_dump(usb_device):
    while True:
        #time.sleep(0.001)
        packet = ReadUSBDevice(usb_device)
        print "packet pid: %s" % packet.pid

class DumpThread(Thread):
    def run(self):
        usb_dump()

usb_device = OpenUSBDevice()
t = DumpThread()
t.start()
print "Sleep 1"
time.sleep(1)
print "End"
CloseUSBDevice(usb_device)
sys.exit(0)

(I could paste actual code, but since you need the hardware device I figure it won't help much).

I'm expecting this code to start dumping usb packets for about a second before the main thread terminates the entire program. However, all I see is "Sleep 1" and then the usb_dump() procedure runs forever. If I uncomment the "time.sleep(0.001)" statement in the inner loop of the usb_dump() procedure things start working the way I expect, but then the python code becomes unable to keep up with all the packets coming in :-(

The vendor tells me that this is an python scheduler problem and not their api's fault and therefor won't help me:

«However, it seems like you are experiencing some nuances when using threading in Python. By putting the time.sleep in the DumpThread thread, you are explicitly signaling to the Python threading system to give up control. Otherwise, it is up the Python interpreter to determine when to switch threads and it usually does that after a certain number of byte code instructions have been executed.»

Can somebody confirm that python is the problem here? Is there another way to make the DumpThread release control? Any other ideas?

Was it helpful?

Solution

Your vendor would be right if yours was pure python code; however, C extensions may release the GIL, and therefore allows for actual multithreading.

In particular, time.sleep does release the GIL (you can check it directly from the source code, here - look at floatsleep implementation); so your code should not have any problem. As a further proof, I have made also a simple test, just removing the calls to USB, and it actually works as expected:

from threading import Thread
import time
import sys

usb_device = 0

def usb_dump():
    for i in range(100):
        time.sleep(0.001)
        print "dumping usb"

class DumpThread(Thread):
    def run(self):
        usb_dump()

t = DumpThread()
t.start()
print "Sleep 1"
time.sleep(1)
print "End"
sys.exit(0)

Finally, just a couple of notes on the code you posted:

  • usb_device is not being passed to the thread. You need to pass it as a parameter or (argh!) tell the thread to get it from the global namespace.
  • Instead of forcing sys.exit(), it could be better to just signal the thread to stop, and then closing USB device. I suspect your code could get some multithreading issue, as it is now.
  • If you need just a periodic poll, threading.Timer class may be a better solution for you.

[Update] About the latest point: as told in the comment, I think a Timer would better fit the semantic of your function (a periodic poll) and would automatically avoid issues with the GIL not being released by the vendor code.

OTHER TIPS

I'm assuming you wrote a Python C module that exposes the ReadUSBDevice function, and that it's intended to block until a USB packet is received, then return it.

The native ReadUSBDevice implementation needs to release the Python GIL while it's waiting for a USB packet, and then reacquire it when it receives one. This allows other Python threads to run while you're executing native code.

http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock

While you've unlocked the GIL, you can't access Python. Release the GIL, run the blocking function, then when you know you have something to return back to Python, re-acquire it.

If you don't do this, then no other Python threads can execute while your native blocking is going on. If this is a vendor-supplied Python module, failing to release the GIL during native blocking activity is a bug.

Note that if you're receiving many packets, and actually processing them in Python, then other threads should still run. Multiple threads which are actually running Python code won't run in parallel, but it'll frequently switch between threads, giving them all a chance to run. This doesn't work if native code is blocking without releasing the GIL.

edit: I see you mentioned this is a vendor-supplied library. If you don't have source, a quick way to see if they're releasing the GIL: start the ReadUSBDevice thread while no USB activity is happening, so ReadUSBDevice simply sits around waiting for data. If they're releasing the GIL, the other threads should run unimpeded. If they're not, it'll block the whole interpreter. That would be a serious bug.

I think the vendor is correct. Assuming this is CPython, there is no true parallel threading; only one thread can execute at a time. This is because of the implementation of the global interpreter lock.

You may be able to achieve an acceptable solution by using the multiprocessing module, which effectively sidesteps the garbage collector's lock by spawning true sub-processes.

Another possibility that may help is to modify the scheduler's switching behaviour.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top