質問

How to make my i7 processor reach 100% usage with this code? Is there something special that happens in the XmlDocument? is just because of the context change? and if so why putting more threads wouldnt make the the processor use its full power? what would be the fatest way to parse several strings at a time?

EDIT:

Maybe this code will make it more clear, no matter what number of threads it uses 30% of the processor:

    private void Form1_Load(object sender, EventArgs e)
    {
        Action action = () =>
        {
            while (true)
            {
                XmlDocument xmlDocument = new XmlDocument();

                xmlDocument.LoadXml("<html><body><div>1111</div><div>222</div></body></html>");
                var nodes = xmlDocument.SelectNodes("//div");
            }
        };

        Parallel.For(0, 16, i => action());
    }
役に立ちましたか?

解決

In your code sample (and you would see this with a profiler) you are wasting a LOT time waiting for available resources to run those threads. Because you are constantly requesting more and more Parallel.For (which is a non-blocking call) - your process is spending significant time waiting for threads to finish and then the next thread to be selected (an ever growing amount of such threads all requesting time to run).

Consider this output from the profiler:

The RED color is synchronization! Look how much work is going on by the kernel to let my app run so many threads! Note, if you had a single core processor, you'd definitely see 100%

enter image description here

You're going to have the best time reading this xml by splitting the string and parsing them separately (post-load from I/O of course). You may not see 100% cpu usage, but that's going to be the best option. Play with different partition sizes of the string (i.e. substring sizes).

For an amazing read on parallel patterns, I recommend this paper by Stephen Toub: http://download.microsoft.com/download/3/4/D/34D13993-2132-4E04-AE48-53D3150057BD/Patterns_of_Parallel_Programming_CSharp.pdf

EDIT I did some searching for a smart way to read xml in multiple threads. My best advice is this:

  1. Split your xml files into smaller files if you can.
  2. Use one thread per xml file.
  3. If 1&2 aren't sufficient for you perf needs, consider not loading it as xml completely, but partitioning the string (splitting it), and parsing a bit by hand (not to an XmlDocument). I would only do this if 1 and 2 are good enough for your needs. Each partition (substring) would run on its own thread. Remember too that "more threds" != "more cpu usage", at least not for your app. As we see in the profiler example, too many threads costs a lot of overhead. Keep it simple.

他のヒント

Is this the actual code you are running, or are you loading the xml from a file or other URL? If this is the actual code, then it's probably finishing too fast and the CLR doesn't have time to optimize the threadcount, but when you put the infinite loop it guarantees you'll max out the CPUs.

If you are loading XML from real sources, then threads can be waiting for IO responses and that won't consume any CPU while that's happening. To speed that case up you can preload all the XML using lots of threads (like 20+) into memory, and then use 8 threads to do the XML parsing afterwards.

The processor is the fastest component on a modern PC. Bottlenecks often come in the form of RAM or Hard Drives. In the first case, you are continuously creating a variable with the potential to eat up a lot of memory. Thus, its intuitive that RAM becomes the bottleneck as cache quickly runs dry.

In the second case you are not creating any variables (I'm sure .NET is doing plenty in the background, albeit in a highly optimized way). So, its intuitive that all the work stays at the CPU.

How the OS handles memory, interrupts etc is impossible to fully define. You can use tools that help define these situations, but last time I checked there's not even a Memory analyzer for .NET code. So that's why I say take the answer with a grain of salt.

The Task Parallel Library distributes the Actions, so you lose a bit of control when it comes to process utilization. For the most part that's a good thing because we don't have to worry about creating too many threads, making our threads too big, etc. If you want to explicitly create threads then the following code should push your processor to the max:

Parallel.For(0, 16, index => new Thread(() =>
                {
                    while (true)
                        new Thread(() =>
                            {
                                XmlDocument xmlDocument = new XmlDocument();
                                xmlDocument.LoadXml("<html><body><div>1111</div><div>222</div></body></html>");
                                var nodes = xmlDocument.SelectNodes("//div");
                            }).Start();
                }).Start());

I'm not saying I recommend this approach, just showing a working example of the code pushing my processor to the max (AMD FX-6200). I was seeing about 30% using the Task Parallel Library, too.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top