No increase in speed when multithreading python hdf5 parsing function

Question 1

Python cannot use multiple cores (due to GIL) unless you spawn subprocesses (with multiprocessing for example). Thus you won't get any performance boost with spawning threads for CPU bound tasks.

Here's an example of a script using multiprocessing and queue:

from Queue import Empty # <-- only needed to catch Exception
from multiprocessing import Process, Queue, cpu_count

def loadFile(d, i, queue):
    # some other stuff
    queue.put(result)

if name == "main":
    queue = Queue()
    no = cpu_count()
    processes = []

    for i,d in enumerate(sortedDirSet):
        p = Process(target=self.loadFile, args=(d, i, queue))
        p.start()
        processes.append(p)

        if i % no == 0:
            for p in processes:
                p.join()
            processes = []

    for p in processes:
        p.join()

    results = []
    while True:
        try:
            # False means "don't wait when Empty, throw an exception instead"
            data = queue.get(False)
            results.append(data)
        except Empty:
            break

    # You have all the data, do something with it

The other (more complicated) way would be to use pipe instead of queue.

It would be also more efficient to spawn processes, then create a job queue and send them (via pipe) to subprocesses (so you won't have to create a process each time). But this would be even more complicated, so let's leave it like that.

Question 2

Freakish is correct with his answer, it will be the GIL thwarting your efforts.

If you were to use python 3, you could do this very nicely using concurrent.futures. I believe PyPy has also backported this feature.

Also, you could eek a little bit more speed out of your code by replacing your list comprehension:

''.join([chr(x) for x in file[dataSetName].value])

With a map:

''.join(map(chr, file[dataSetName].value))

My tests (on a massive random list) using above code showed 15.73s using list comprehension and 12.44s using map.