Question

I have coded a JPG decoder as such

for each dataunit{
  decode
  transform
  write to rgb buffer
}

Then I coded it with boost threads as such

for each dataunit{
  decode
}
for each dataunit{
  transform
}
for each dataunit{
  write to rgb buffer
}

...running these loops on their own thread with 2 threads running in parallel on a 3 core CPU. But I can't seem to beat my performance with the non threaded program. Am I missing something?

Do threads hamper the compiler's ability to optimize the program?

Will a non threaded program still use the 3 cores of my CPU?

thanks so much for clearing anything up.

Edit: apparently my threads were all accessing the same buffer (not the same locations in the buffer) and that causes great CPU cache coherency overhead. Each CPU core has its own cache that needs to sync with the other caches if any changes are made to shared buffer. I retooled my code to split my buffers into 3 and then have each thread work on their own buffer. I was hoping this would solve any cache coherency problems but it hasn't seemed to speed up my program. I still cannot the beat the serial program with my parallel one.

Edit: I'm embarrassed to say that I was measuring the CPU time of my program and not the WALL time. WALL time clearly shows my program is ~50% faster when it is threaded. The CPU time of the threaded program is actually higher by ~7% because it adds the work done by the 3 cores in the CPU (I presume) with extra overhead from managing the threads.

Was it helpful?

Solution

Your design is probably inefficient. First, you keep having to pass the data from thread to thread. Second, if one of these three steps takes significantly more time than the other two, the potential maximum benefit is small.

OTHER TIPS

I'm embarrassed to say that I was measuring the CPU time of my program and not the WALL time. WALL time clearly shows my program is ~50% faster when it is threaded. The CPU time of the threaded program is actually higher by ~7% because it adds the work done by the 3 cores in the CPU (I presume) with extra overhead from managing the threads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top