So out of curiosity and idle boredom, I was fooling around with benchmarking Shlemiel the painter's algorithm. I started with a blank string, created another one of 1000 blank spaces, and started adding one to the other, using plain old inefficient string concatenation, timing how long it took each time.
string s1 = "";
string s2 = "";
while (s2.Length < 1000)
{
s2 += " ";
}
while (true)
{
Stopwatch sw = Stopwatch.StartNew();
s1 += s2;
sw.Stop();
Console.WriteLine(" {0}| {1}", s1.Length, sw.ElapsedMilliseconds);
}
As expected, the longer the string got, the longer it took to concatenate (it was a much smaller impact than I expected, but that's another question for another day). What was surprising, however, was consistent spikes in the time it took. Every sixth concatenation took roughly two to three times as long as the five preceding concatenations.
Length | Time (ms)
-----------------------
32250000 | 117
32251000 | 44
32252000 | 31
32253000 | 30
32254000 | 30
32255000 | 32
32256000 | 129
32257000 | 35
32258000 | 43
32259000 | 34
32260000 | 30
32261000 | 29
32262000 | 107
32263000 | 47
32264000 | 29
32265000 | 30
32266000 | 31
32267000 | 29
32268000 | 110
32269000 | 46
32270000 | 31
32271000 | 30
32272000 | 30
32273000 | 30
32274000 | 113
These samples are from once the string started getting significantly large, but the pattern holds from the start. Largely the first thousand or so samples are too small to notice the pattern, but around the 1.8k mark it's recognizable.
My first assumption was that behind the scenes, the characters were being stored in some sort of ArrayList/vector type deal, which doubles in size once it's full, but as I thought about it more, that doesn't fit - if that were the case, spike would come in exponential periods, rather than linear.
So, in short: what the hell is going on here?