Question

So, I wrote a program that generates a mandelbrot image. Then, I decided to write it in a way that would use a specified number of threads to speed it up. This is what I came up with:

void mandelbrot_all(std::vector<std::vector<int>>& pixels, int X, int Y, int threadCount) {
    using namespace std;

    vector<thread> threads;
    int numThreads = threadCount;
    for(int i=0; i<numThreads; i++) {
        threads.push_back(thread (mandelbrot_range, std::ref(pixels), i*X/numThreads, 0, X*(i+1)/numThreads, Y, X));
    }
    for(int i=0; i<numThreads; i++) {
        threads[i].join();
    }
}

The intention was to split the processing into chunks and process each one separately. When I run the program, it takes a number as an argument, which will be used as the number of threads to be used in the program for that run. Unfortunately, I get similar times for any number of threads.

Is there something about threading in c++ that I'm missing? Do I have to add something or boilerplate of some kind to make the threads function simultaneously? Or is the way I'm making threads just silly?

I've tried running this code on a raspberry pi and my quad core laptop, with these same results.

Any help would be appreciated.

Was it helpful?

Solution

I'm a little late back to this question, but looking back, I remember the solution: I was programming on a single-core raspberry pi. One core means no speedup from threading.

OTHER TIPS

I think spawning the threads is too expensive, You could try PPL or TBB. which both have parallel_for and parallel_foreach, and use those to loop through the pixels instead of using threads. they internally manage the threads so you have less overhead and the most throughput.

Solving one problem at a time, why not give it a try and hardcode the use of 2 threads, then 3? Thread starting is expensive however if you start only 2 threads and calculate a fairly large Mandelbrot, then thread start time will be relatively zero.

Up until you don't achieve 2x and 3x speedup, then you have other problems that you need to debug & solve, separately.

Without looking at your code and playing with it, it's hard to pinpoint what the problem is exactly. Here's a guess though: some portions of the Mandelbrot set image is much easier to compute than others. Your code is cutting the image up into equal slices by the x-axis, but the majority of the work (say 70%) could fall into one slice. In that case, the best you can do is a 30% speed up, since rest of the threads still have to wait for the last one to finish. For example, if you run with four threads and cut up the image into four pieces, the third piece certainly looks more intensive than the rest. Of course the 70% is just an estimate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top