Question

I wrote a multi-threaded program which does some CPU heavy computation with a lot of floating point operations. More specifically, it's a program which compares animation sequences frame by frame. I.e. it compares frame data from animation A with all the frames in animation B, for all frames in animation A. I carry out this intensive operation for different animations in parallel, so the program can be working on A-B pair, B-C pair and C-A pair in parallel. The program is using QtConcurrent and a "map" function which maps a container with motions onto a function. QtConcurrent manages thread pool for me, I am working on Intel Quad Core processor so it spawns 4 threads.

Now, the problem is that my process destroys my CPU. The usage is 100% constant and I actually get a Blue Screen of Death if I run my program on a big enough set of motions (Page fault in non-paged area). I suspect that this is because my computer is overclocked. However, could this be because of the way I coded my program? Some very intensive benchamrking tools I used to test my machine's stability never crashed my PC. Is there any way to control how my program uses my CPU to reduce the load? Or perhaps I am misunderstanding my problem?

Was it helpful?

Solution

There are some excellent answers here.

I would only add, from the perspective of having done lots of performance tuning, unless each thread has been optimized aggressively, chances are it has lots of room for cycle-reduction.

To make an analogy with a long-distance auto race, there are two ways to try to win:

  1. Make the car go faster
  2. Make fewer stops and side-trips

In my experience, most software as first written is quite far from taking the most direct route, especially as the software gets large.

To find wasted cycles in your program, as Kenneth Cochran said, never guess. If you fix something without having proved that it is a problem, you are investng in a guess.

The popular way to find performance problems is to use profilers.

However, I do this a lot, and my method is this: http://www.wikihow.com/Optimize-Your-Program%27s-Performance

OTHER TIPS

Overclocking PCs can lead to all sorts of strange problems. If you suspect that to be the root cause of your problem, try to clock it in reasonable ranges and retry your tests.

It could also be some sort of quite strange memory-bug where you corrupt your RAM in a way where Windows (I guess that OS, because of BSOD) cannot recover anymore (very unlikely, but who knows).

Another possibility I can think of is, that you've got some error in your threading-implementation which kills windows.

But at first, I'd look at the overclocking-issue...

the kind of operation you've described is already highly parallelizable. Running more than one job may actually hurt performance. The reason for this is because the cache of any processor is of limited size, and the more you try to do concurrently, the smaller each thread's share of the cache becomes.

You might also look into the options using your GPU to soak up some of the processing load. Modern GPU's are vastly more efficient for most kinds of video transformation than CPU's of similar generations.

I suspect that this is because my computer is overclocked.

It's definitely possible. Try setting it to normal speed for a while.

could this be because of the way I coded my program?

A program running in user mode is very unlikely to cause a BSOD.

At a guess, I would say you are not running of a 3-core machine (or 4, given 100% usage), and parallelizing will actively hurt your performance if you use more threads than cores. Make only one thread per CPU core, and whatever you do, never have data accessed by different threads at the same time. The cache-locking algorithms in most multi-core CPUs will absolutely slaughter your performance. In this case, on a N-core CPU processing L-frame animations, I would use thread 1 on frames 0-(L/N), thread 2 on frames (L/N)-(2*L/N), ... thread N on frames ((N-1)*L/N)-L. Do the different combinations (A-B, B-C, C-A) in sequence so you don't thrash your cache, also, it should be simpler to code.

As a side note? Real computation like this should be using 100% CPU, it means it's going as fast as it can.

The overclocking is the most likely cause of the instability. With any CPU intensive algorithm there is going to be some CPU thrashing. The overclocking not withstanding, I would find a good performance profiler to find performance bottlenecks. Never guess where the problem is. You could spend months optimizing something that has no real affect on performance or worse performance could even decrease.

It's all too easy to blame the hardware. I would suggest you try running your program on a different system and see how that turns out with the same data.

Probably you have a bug.

Look into using SIMD operations. I think you'd want SSE in this case. They're often a better first step than parallelization as they are easier to get correct and provide a pretty hefty boost to most linear algebra types of operations.

Once you get it using SIMD, then look into parallelizing. It sounds like you're slamming the CPU also, so you could perhaps do with some sleeps instead of busy waits perhaps, and make sure you're cleaning up or reusing threads properly.

With the absence of the BSOD error code (useful for looking up) it is a bit harder to help you with this one.

You might try physically reseating your memory ((take it out and drop it in). I, and some others I know, have worked on a few machines where this was needed. For instance I once trying to upgrade OS X on a machine and it kept crashing... finally I popped the memory out and dropped it back in and everything was fine.

Sleep(1); will cut CPU usage in half. I ran into the same problem working with a CPU intensive algorithm.

If your processor has two cores or more you can go to task manager and go to processes and right click on the program name and click Set affinity and set the program to use fewer cores.

It will then take longer to do the actions you're asking but will cause a SIGNIFICANT decrease in CPU usage.

I think blue screen of death is caused when kernel memory region gets corrupted. So using multithreading to carry out parallel operations could not be the reason for this.

Well if you are creating multiple threads each carrying heavy floating point operations then definitely your CPU utilization will reach upto 100%.

It would be better if you can give some sleep in each thread so that other process get some chance. You may also try to reduce the priority of threads.

If in Windows platform, put after some work one call to function to inform CPU you want to make the cpu to other processes. Make a call to sleep function like that :

Slepp ( 0 );

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top