Threads in c++ not generating speedup on mandelbrot image processing

Question 1

I'm a little late back to this question, but looking back, I remember the solution: I was programming on a single-core raspberry pi. One core means no speedup from threading.

Question 2

I think spawning the threads is too expensive, You could try PPL or TBB. which both have parallel_for and parallel_foreach, and use those to loop through the pixels instead of using threads. they internally manage the threads so you have less overhead and the most throughput.

Question 3

Solving one problem at a time, why not give it a try and hardcode the use of 2 threads, then 3? Thread starting is expensive however if you start only 2 threads and calculate a fairly large Mandelbrot, then thread start time will be relatively zero.

Up until you don't achieve 2x and 3x speedup, then you have other problems that you need to debug & solve, separately.

Question 4

Without looking at your code and playing with it, it's hard to pinpoint what the problem is exactly. Here's a guess though: some portions of the Mandelbrot set image is much easier to compute than others. Your code is cutting the image up into equal slices by the x-axis, but the majority of the work (say 70%) could fall into one slice. In that case, the best you can do is a 30% speed up, since rest of the threads still have to wait for the last one to finish. For example, if you run with four threads and cut up the image into four pieces, the third piece certainly looks more intensive than the rest. Of course the 70% is just an estimate.