I have a function that:

1) reads in a hdf5 dataset as integer ascii code

2) converts ascii integers to characters...chr() function

3) joins the characters into a single string function

Upon profiling, I found that the vast majority of the calculation is spent on the step #2, the conversion of the ascii integers to characters. I have somewhat optimized this call by using:

''.join([chr(x) for x in file[dataSetName].value])

As my parsing function seems to be cpu bound (the conversion of integer to characters) and not i/o bound, I expected to obtain a more/less linear speed enhancement with the number of cores devoted to parsing. To parse one file serially takes ~15 seconds...to parse 10 files (on my 12 core machine) takes ~150 seconds while using 10 threads. That is, there seems to be no enhancement at all.

I have used the following code to launch my threads:

    threads=[]
    timer=[]
    threadNumber=10
    for i,d in enumerate(sortedDirSet):
        timer.append(time.time())
     #   self.loadFile(d,i)
        threads.append(Thread(target=self.loadFileargs=(d,i)))
        threads[-1].start()
        if(i%threadNumber==0):
            for i2,t in enumerate(threads):
                t.join()
                print(time.time()-timer[i2])
            timer=[]
            threads=[]

    for t in threads:
        t.join()

Any help would be greatly appreciated.

有帮助吗?

解决方案

Python cannot use multiple cores (due to GIL) unless you spawn subprocesses (with multiprocessing for example). Thus you won't get any performance boost with spawning threads for CPU bound tasks.


Here's an example of a script using multiprocessing and queue:

from Queue import Empty # <-- only needed to catch Exception
from multiprocessing import Process, Queue, cpu_count

def loadFile(d, i, queue):
    # some other stuff
    queue.put(result)

if name == "main":
    queue = Queue()
    no = cpu_count()
    processes = []

    for i,d in enumerate(sortedDirSet):
        p = Process(target=self.loadFile, args=(d, i, queue))
        p.start()
        processes.append(p)

        if i % no == 0:
            for p in processes:
                p.join()
            processes = []

    for p in processes:
        p.join()

    results = []
    while True:
        try:
            # False means "don't wait when Empty, throw an exception instead"
            data = queue.get(False)
            results.append(data)
        except Empty:
            break

    # You have all the data, do something with it

The other (more complicated) way would be to use pipe instead of queue.

It would be also more efficient to spawn processes, then create a job queue and send them (via pipe) to subprocesses (so you won't have to create a process each time). But this would be even more complicated, so let's leave it like that.

其他提示

Freakish is correct with his answer, it will be the GIL thwarting your efforts.

If you were to use python 3, you could do this very nicely using concurrent.futures. I believe PyPy has also backported this feature.

Also, you could eek a little bit more speed out of your code by replacing your list comprehension:

''.join([chr(x) for x in file[dataSetName].value])

With a map:

''.join(map(chr, file[dataSetName].value))

My tests (on a massive random list) using above code showed 15.73s using list comprehension and 12.44s using map.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top