Question

I'm working on a program that processes many requests, none of them reaching more than 50% of CPU (currently I'm working on a dual core). So I created a thread for each request, the whole process is faster. Processing 9 requests, a single thread lasts 02min08s, while with 3 threads working simultaneously the time decreased to 01min37s, but it keeps not using 100% CPU, only around 50%.

How could I allow my program to use full processors capability?

EDIT The application isn't IO or Memory bounded, they're at reasonable levels all the time.

I think it has something to do with the 'dual core' thing.

There is a locked method invocation that every request uses, but it is really fast, I don't think this is the problem.

The more cpu-costly part of my code is the call of a dll via COM (the same external method is called from all threads). This dll is also no Memory or IO-bounded, it is an AI recognition component, I'm doing an OCR recognition of paychecks, a paycheck for request.

EDIT2

It is very probable that the STA COM Method is my problem, I contacted the component owners in order to solve this problem.

Was it helpful?

Solution

Do you have significant locking within your application? If the threads are waiting for each other a lot, that could easily explain it.

Other than that (and the other answers given), it's very hard to guess, really. A profiler is your friend...

EDIT: Okay, given the comments below, I think we're onto something:

The more cpu-costly part of my code is the call of a dll via COM (the same external method is called from all threads).

Is the COM method running in an STA by any chance? If so, it'll only use one thread, serializing calls. I strongly suspect that's the key to it. It's similar to having a lock around that method call (not quite the same, admittedly).

OTHER TIPS

The problem is the COM object.

Most COM objects run in the context of a 'single-threaded apartment'. (You may have seen a [STAThread] annotation on the main method of a .NET application from time to time?)

Effectively this means that all dispatches to that object are handled by a single thread. Throwing more cores at the problem just gives you more resources that can sit around and wait or do other things in .NET.

You might want to take a look at this article from Joe Duffy (the head parallel .NET guy at Microsoft) on the topic.

http://www.bluebytesoftware.com/blog/PermaLink,guid,8c2fed10-75b2-416b-aabc-c18ce8fe2ed4.aspx

In practice if you have to do a bunch of things against a single COM object like this you are hosed, because .NET will just serialize access patterns internally behind your back. If you can create multiple COM objects and use them then you can resolve the issue because each can be created and accessed from a distinct STA thread. This will work until you hit about 100 STA threads, then things will go wonky. For details, see the article.

It is probably no longer the processor that is the bottleneck for completing your process. The bottleneck has likely moved to disk access, network access or memory access. You could also have a situation where your threads are competing for locks.

Only you know exactly what your threads are doing, so you need to look at them with the above in mind.

It depends what your program does - the work carried out by your concurrent Requests could be IO-bound - limited by the speed of (eg) your hard disk - rather than CPU bound, when you would see your CPU hit 100%.

After the edit, it does sound like COM STA objects might be the culprit.

Do all threads call the same instance of the COM object? Would it be possible to make your worker thread STA threads, and create a separate instance of the COM object on each thread. In this way it might be possible to avoid the STA bottleneck.

To tell if a COM coclass is STA:

class Test
{
  static void Main() //This will be an MTA thread by default
  {
    var o = new COMObjectClass();
    // Did a new thread pop into existence when that line was executed?
    // If so, .NET created an STA thread for it to live in.
  }
}

I think I had a similar problem. I was creating multiple threads in c# that ran c++ code through a COM interface. My dual core CPU never reached 100%.

After reading this post, I almost gave up. Then I tried calling SetApartmentState(ApartmentState.STA) on my Threads.

After only changing this, the CPU maxed out.

It sounds like your application's performance may not be 'bound' by the amount of cpu resources available. If you're processing requests over the network, the cpu(s) may be waiting for the data to arrive, or for the network device to transfer the data. Alternatively, if you need to look up data to fulfill the request, the cpu may be waiting for the disk.

Are you sure that your tasks require intensive processor activity? Is there any IO processing? This can be the reason for your 50% load.

Test: Try using only 2 threads and set he affinity of each thread for each Core. Then open task manager and watch the load of both cores.

This isn't an answer really, but have you checked perfmon to see what resources it is using and have you run profilers on the code to see where it is spending time?

How have you determined that IO or other non CPU resources are not the bottleneck?

Can you give a brief description of what the threads are doing?

if your process is running on cpu 0 and spawning threads there, the maximum it will ever reach is 50%. See if you have threads running on both cores or on just one. I would venture to guess you're isolated to a single core, or that one of your dependent resources is locked on a single core. If it hits exactly 50% then a single core is very likely to be your bottleneck.

So you solved the problem of using a single COM object and now have an IO problem.

The increased run time for multiple threads is probably because of mixing random IO together, which will slow it all down.

If the data set will fit into RAM, try to see if you can prefetch it into cache. Perhaps just reading the data, or maybe memory mapping it together with a command to make it available.

This is why SQL databases will often choose sequential table scan over an index scan on queries you wouldn't expect: it can be much faster to read all of it in order than to read it in random chunks.

Maybe I'm misunderstanding something, but you said none of your requests (each in a separate thread) reaches 100% CPU.

What operating system are you using?

I seem to vaguely recall that in old versions of windows (e.g., early XPs and 2000s), CPU utilization was considered from total of two processors, so a single thread wasn't able to make it past 50% unless it was the idle process..

One more note, have you tried launching your code not from Visual Studio (regardless of release / debug settings) ?

The problem is the COM object. It is STA, and I can't either have two instances running concurrently on the same process. When I create an instance for the COM class, the other becomes unusable.

I've contacted the component developers, they're thinking what they can do for me.

Thanks you all ;)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top