Pregunta

My dictionary is

Dictionary<string, string> d = new Dictionary<string, string>();

I'm iterating through an XML file (very large) and saving key/value pairs in a dictionary.

The following snapshot of code is very slow in execution and I want to make it faster. It takes around more than one hour to complete where my ctr value reaches to 3332130.

if (d.ContainsKey(dKey))
{
    dValue = d[dKey];
    d[dKey] = dValue + "," + ctr;
}
else
    d.Add(dKey, ctr.ToString());

ctr++;
¿Fue útil?

Solución

3332130 is a large number to store in memory, you should not hold such a big collection in memory.

Being said that, Let's try to optimize this.

Dictionary<string, StringBuilder>() d = new Dictionary<string, StringBuilder>();
StringBuilder builder;
if (d.TryGetValue(dKey, out builder))
{
    builder.Append(",");
    builder.Append(ctr);
}
else
{
   d.Add(dKey, new StringBuilder(ctr.ToString()));
}
  1. String concatenation in tight loop is awfully slow, use StringBuilder instead
  2. Use TryGetValue which avoids you to call dValue = d[dKey];.

I believe this should increase performance significantly.

Otros consejos

Performing a number of repeated concatenations not known at compile time on large strings is an inherently wasteful thing to do. If you end up concatting a lot of values together, and they are not particularly small, that could easily be the source of your problem.

If so, it would have nothing at all to do with the dictionary. You should consider using a StringBuilder, or building up a collection of separate strings that you can join using string.Join when you have all of the strings you'll need for that value.

You may want to consider using StringBuilders instead of strings:

var d = new Dictionary<string, StringBuilder>();

And append the values like this:

if (d.ContainsKey(dKey))
{
    d[dKey].Append("," + ctr);
}
else
    d.Add(dKey, new StringBuilder(ctr.ToString()));
++ctr;

But I suspect that the bottleneck is in fact somewhere else.

in addition to String concatenation enhancements, you can also split your XML into several data sets and then populate ConcurrentDictionary in parallel with them. Depending on your data and framework you are using the performance could increase in times.

More examples here and here

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top