Inverse String.Replace - Faster way of doing it?
-
05-07-2019 - |
Question
I have a method to replace every character except those I specify. For example,
ReplaceNot("test. stop; or, not", ".;/\\".ToCharArray(), '*');
would return
"****.*****;***,****".
Now, this is not an instance of premature optimization. I call this method quite a few times during a network operation. I found that on longer strings, it is causing some latency, and removing it helped a bit. Any help to speed this up would be appreciated.
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
int old = -1;
StringBuilder sb = new StringBuilder(original.Length);
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb.Append(new string(replacement, index - old - 1));
sb.Append(original[index]);
old = index++;
}
if (original.Length - old > 1)
{
sb.Append(new string(replacement, original.Length - (old + 1)));
}
return sb.ToString();
}
Final #'s. I also added a test case for a 3K character string, ran at 100K times instead of 1M to see how well each of these scales. The only surprise was that the regular expression 'scaled better' than the others, but it is no help since it is very slow to begin with:
User Short * 1M Long * 100K Scale John 319 2125 6.66 Luke 360 2659 7.39 Guffa 409 2827 6.91 Mine 447 3372 7.54 DirkGently 1094 9134 8.35 Michael 1591 12785 8.04 Peter 21106 94386 4.47
Update: I made the creation of the regular expression for Peter's version a static variable, and set it to RegexOptions.Compiled to be fair:
User Short * 1M Long * 100K Scale Peter 8997 74715 8.30
Pastebin link to my testing code, please correct me if it is wrong: http://pastebin.com/f64f260ee
Solution
Alright, on a ~60KB string, this will perform about 40% faster than your version:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
int index = 0;
StringBuilder sb = new StringBuilder(new string(replacement, original.Length));
while ((index = original.IndexOfAny(pattern, index)) > -1)
{
sb[index] = original[index++];
}
return sb.ToString();
}
The trick is to initialize a new string with all replacement characters, since most of them will be replaced.
OTHER TIPS
Can't you use Regex.Replace like so:
Regex regex = new Regex(@"[^.;/\\]");
string s = regex.Replace("test. stop; or, not", "*");
I don't know if this will be any faster, but it avoids newing up strings just so they can be appended to the string builder, which may help:
public static string ReplaceNot(this string original, char[] pattern, char replacement)
{
StringBuilder sb = new StringBuilder(original.Length);
foreach (char ch in original) {
if (Array.IndexOf( pattern, ch) >= 0) {
sb.Append( ch);
}
else {
sb.Append( replacement);
}
}
return sb.ToString();
}
If the number of chars in pattern
will be of any size (which I'm guessing it generally won't), it might pay to sort it and perform an Array.BinarySearch()
instead of the Array.indexOf()
.
For such a simple transformation, I'd bet that it'll have no problem being faster than a regex, too.
Also, since your set of characters in pattern
are likely to usually come from a string anyway (at least that's been my general experience with this type of API), why don't you have the method signature be:
public static string ReplaceNot(this string original, string pattern, char replacement)
or better yet, have an overload where pattern
can be a char[]
or string
?
Here's another version for you. My tests suggest that its performance is pretty good.
public static string ReplaceNot(
this string original, char[] pattern, char replacement)
{
char[] buffer = new char[original.Length];
for (int i = 0; i < buffer.Length; i++)
{
bool replace = true;
for (int j = 0; j < pattern.Length; j++)
{
if (original[i] == pattern[j])
{
replace = false;
break;
}
}
buffer[i] = replace ? replacement : original[i];
}
return new string(buffer);
}
The StringBuilder has an overload that takes a character and a count, so you don't have to create intermediate strings to add to the StringBuilder. I get about 20% improvement by replacing this:
sb.Append(new string(replacement, index - old - 1));
with:
sb.Append(replacement, index - old - 1);
and this:
sb.Append(new string(replacement, original.Length - (old + 1)));
with:
sb.Append(replacement, original.Length - (old + 1));
(I tested the code that you said was about four times faster, and I find it about 15 times slower...)
It's going to be O(n). You seem to be replacing all alphabets and whitespaces by *
, why not just test if the current character is an alphabet/whitespace and replace it?