Question

I have been tinkering with BSP trees for a while now and am also playing with threads. When adding a triangle to a BSP tree, an opportunity arises to create a new thread for the purposes of processing data in parallel.

insert(triangle, bspnode)
{
  ....
  else if(triangle spans bspnode)
  {
    (frontpiece, backpiece) = plane_split(triangle, bspnode)

    insert(frontpiece, bspnode.front)
    insert(backpiece, bspnode.back)
  }
  ....
}

The two insert operations above could be executed by two threads, and since they do not modify the same data, cheap synchronization can be used.

insert(triangle, bspnode)
{
  ....
  else if(triangle spans bspnode)
  {
    (frontpiece, backpiece) = split(triangle, bspnode)

    handle = beginthread(insert(backpiece, bspnode.front))
    insert(frontpiece, bspnode.back)
    if(handle)
    {
      waitforthread(handle)
    }
    else
    {
      insert(backpiece, bspnode.front)
    }
  }
  ....
}

This new method attempts to create a thread to complete the operation in parallel, but should not fail if the thread cannot be created (it will simply revert to the original algorithm).

Is this a sound programming practice, or am I using threads improperly? I have not been able to find any literature on this technique. I like that it tends to use my CPU to its fullest (2 cores), and would theoretically scale to any number of processors available. I don't like that it might be horribly wasteful on CPU and memory.

Was it helpful?

Solution

Threads are great if some part of the processing is waiting on something external (user input, I/O, some other processing) - the thread that's waiting can continue to wait, while a thread that isn't waiting forges on ahead.

However, for processing-intensive tasks, more threads than processors actually creates overhead. It seems like your threads are doing all "CPU work", so I'd stick to one thread per core - test to find the optimal number, though.

The biggest overhead created is from context switching (freezing one thread and loading the execution context of the next one), as well as cache misses when threads are doing tasks with different memory (if your thread can use the CPU cache effectively).

OTHER TIPS

your best bet would be to create a threadpool, and then use it 'transparently' to add nodes.

eg, create 2 threads at program start, have them wait on a semaphore or event. When you have nodes to add, you pop the data onto a queue then trigger the semaphore. This wakes one of the threads which pops the data off the queue and performs the processing. (make sure access to the queue is threadsafe - fully synchronised with a critical section is best).

The overall performance of your app is slower as you have more overhead, in copying data to the queue and running the extra threads, but if you used to run on a single core you will now be running on 2. It works best if the threaded processing is expensive.

Sure, for example, Quicksort can be programmed multithreaded quite easily and get some large performance gains on multi-core systems, and some small performance losses on non-multithreaded. Just remember that you're adding overhead twice now - once for the stack save on the recursion and once on the thread, so if you're doing a large number of recursions then it could overwhelm a system faster than a non-multithreaded approach.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top