有没有不区分大小写的 string.Replace 替代方案？

https://stackoverflow.com/questions/244531

05-07-2019
|

题

我需要搜索一个字符串并替换所有出现的 %FirstName% 和 %PolicyAmount% 具有从数据库中提取的值。问题是 FirstName 的大小写各不相同。这阻止我使用 String.Replace() 方法。我看过有关该主题的网页，其中建议

Regex.Replace(strInput, strToken, strReplaceWith, RegexOptions.IgnoreCase);

但是由于某种原因，当我尝试更换时 %PolicyAmount% 和 $0, ，替换永远不会发生。我认为这与美元符号是正则表达式中的保留字符有关。

我可以使用另一种不涉及清理输入来处理正则表达式特殊字符的方法吗？

解决方案

来自MSDN
$ 0 - “替换与组号（十进制）匹配的最后一个子串。”

在.NET正则表达式中，组0始终是整个匹配。对于文字$，你需要

string value = Regex.Replace("%PolicyAmount%", "%PolicyAmount%", @"$0", RegexOptions.IgnoreCase);

其他提示

似乎 string.Replace 应该有一个带有 StringComparison 参数的重载。既然没有，你可以尝试这样的事情：

public static string ReplaceString(string str, string oldValue, string newValue, StringComparison comparison)
{
    StringBuilder sb = new StringBuilder();

    int previousIndex = 0;
    int index = str.IndexOf(oldValue, comparison);
    while (index != -1)
    {
        sb.Append(str.Substring(previousIndex, index - previousIndex));
        sb.Append(newValue);
        index += oldValue.Length;

        previousIndex = index;
        index = str.IndexOf(oldValue, index, comparison);
    }
    sb.Append(str.Substring(previousIndex));

    return sb.ToString();
}

有点令人困惑的答案，部分原因是问题的标题实际上是很多大于所提出的具体问题。读完后，我不确定是否有任何答案需要进行一些编辑才能吸收这里所有的好东西，所以我想我应该尝试总结一下。

这是一种扩展方法，我认为它避免了这里提到的陷阱，并提供了最广泛适用的解决方案。

public static string ReplaceCaseInsensitiveFind(this string str, string findMe,
    string newValue)
{
    return Regex.Replace(str,
        Regex.Escape(findMe),
        Regex.Replace(newValue, "\\$[0-9]+", @"$$$0"),
        RegexOptions.IgnoreCase);
}

所以...

这是一种扩展方法 @马克罗宾逊
这不尝试跳过正则表达式 @Helge（如果你想在正则表达式之外像这样字符串嗅探，你真的必须逐字节进行）
通过@MichaelLiu 的优秀的测试用例, "œ".ReplaceCaseInsensitiveFind("oe", ""), ，尽管他的想法可能略有不同。

很遗憾， @HA 的评论说你必须这样做 Escape 这三个都不正确. 。初始值和 newValue 不需要。

笔记： 然而你必须逃避 $s 在您要插入的新值中 如果它们是“捕获值”标记的一部分. 。因此，Regex.Replace 中的三个美元符号位于 Regex.Replace [原文如此] 中。没有它，这样的事情就会破裂......

"This is HIS fork, hIs spoon, hissssssss knife.".ReplaceCaseInsensitiveFind("his", @"he$0r")

这是错误：

An unhandled exception of type 'System.ArgumentException' occurred in System.dll

Additional information: parsing "The\hisr\ is\ he\HISr\ fork,\ he\hIsr\ spoon,\ he\hisrsssssss\ knife\." - Unrecognized escape sequence \h.

告诉你什么，我知道那些对正则表达式感到满意的人觉得他们的使用可以避免错误，但我通常仍然偏向于字节嗅探字符串（但只有在读过之后 Spolsky 谈编码）以绝对确定您得到了重要用例的预期结果。让我想起了《克罗克福德》不安全的正则表达式“ 一点。我们经常编写允许我们想要的正则表达式（如果我们幸运的话），但无意中允许更多内容（例如，Is $10 在我的 newValue 正则表达式中确实是一个有效的“捕获值”字符串，上面？）因为我们考虑得不够周到。两种方法都有价值，并且都会鼓励不同类型的无意错误。人们常常很容易低估复杂性。

那个奇怪的 $ 逃脱（并且 Regex.Escape 没有逃脱捕获的价值模式，例如 $0 正如我对重置价值的预期）让我发疯了一段时间。编程很难 (c) 1842

这是一种扩展方法。不知道我在哪里找到它。

public static class StringExtensions
{
    public static string Replace(this string originalString, string oldValue, string newValue, StringComparison comparisonType)
    {
        int startIndex = 0;
        while (true)
        {
            startIndex = originalString.IndexOf(oldValue, startIndex, comparisonType);
            if (startIndex == -1)
                break;

            originalString = originalString.Substring(0, startIndex) + newValue + originalString.Substring(startIndex + oldValue.Length);

            startIndex += newValue.Length;
        }

        return originalString;
    }

}

似乎最简单的方法就是使用.Net附带的Replace方法，并且自.Net 1.0以来一直存在：

string res = Microsoft.VisualBasic.Strings.Replace(res, 
                                   "%PolicyAmount%", 
                                   "<*>", 
                                   Compare: Microsoft.VisualBasic.CompareMethod.Text);

为了使用此方法，您必须添加对Microsoft.VisualBasic组件的引用。此程序集是.Net运行时的标准部分，它不是额外的下载或标记为过时。

    /// <summary>
    /// A case insenstive replace function.
    /// </summary>
    /// <param name="originalString">The string to examine.(HayStack)</param>
    /// <param name="oldValue">The value to replace.(Needle)</param>
    /// <param name="newValue">The new value to be inserted</param>
    /// <returns>A string</returns>
    public static string CaseInsenstiveReplace(string originalString, string oldValue, string newValue)
    {
        Regex regEx = new Regex(oldValue,
           RegexOptions.IgnoreCase | RegexOptions.Multiline);
        return regEx.Replace(originalString, newValue);
    }

受cfeduke的回答启发，我创建了这个函数，它使用IndexOf查找字符串中的旧值，然后用新值替换它。我在处理数百万行的SSIS脚本中使用了这个，而regex方法比这慢。

public static string ReplaceCaseInsensitive(this string str, string oldValue, string newValue)
{
    int prevPos = 0;
    string retval = str;
    // find the first occurence of oldValue
    int pos = retval.IndexOf(oldValue, StringComparison.InvariantCultureIgnoreCase);

    while (pos > -1)
    {
        // remove oldValue from the string
        retval = retval.Remove(pos, oldValue.Length);

        // insert newValue in it's place
        retval = retval.Insert(pos, newValue);

        // check if oldValue is found further down
        prevPos = pos + newValue.Length;
        pos = retval.IndexOf(oldValue, prevPos, StringComparison.InvariantCultureIgnoreCase);
    }

    return retval;
}

扩展 C 。 Dragon 76 的流行答案是将他的代码变成一个扩展，重载默认的 Replace 方法。

public static class StringExtensions
{
    public static string Replace(this string str, string oldValue, string newValue, StringComparison comparison)
    {
        StringBuilder sb = new StringBuilder();

        int previousIndex = 0;
        int index = str.IndexOf(oldValue, comparison);
        while (index != -1)
        {
            sb.Append(str.Substring(previousIndex, index - previousIndex));
            sb.Append(newValue);
            index += oldValue.Length;

            previousIndex = index;
            index = str.IndexOf(oldValue, index, comparison);
        }
        sb.Append(str.Substring(previousIndex));
        return sb.ToString();
     }
}

根据Jeff Reddy的回答，进行一些优化和验证：

public static string Replace(string str, string oldValue, string newValue, StringComparison comparison)
{
    if (oldValue == null)
        throw new ArgumentNullException("oldValue");
    if (oldValue.Length == 0)
        throw new ArgumentException("String cannot be of zero length.", "oldValue");

    StringBuilder sb = null;

    int startIndex = 0;
    int foundIndex = str.IndexOf(oldValue, comparison);
    while (foundIndex != -1)
    {
        if (sb == null)
            sb = new StringBuilder(str.Length + (newValue != null ? Math.Max(0, 5 * (newValue.Length - oldValue.Length)) : 0));
        sb.Append(str, startIndex, foundIndex - startIndex);
        sb.Append(newValue);

        startIndex = foundIndex + oldValue.Length;
        foundIndex = str.IndexOf(oldValue, startIndex, comparison);
    }

    if (startIndex == 0)
        return str;
    sb.Append(str, startIndex, str.Length - startIndex);
    return sb.ToString();
}

类似于C. Dragon的版本，但如果您只需要一次替换：

int n = myText.IndexOf(oldValue, System.StringComparison.InvariantCultureIgnoreCase);
if (n >= 0)
{
    myText = myText.Substring(0, n)
        + newValue
        + myText.Substring(n + oldValue.Length);
}

这是执行正则表达式替换的另一个选项，因为似乎没有多少人注意到匹配包含字符串中的位置：

    public static string ReplaceCaseInsensative( this string s, string oldValue, string newValue ) {
        var sb = new StringBuilder(s);
        int offset = oldValue.Length - newValue.Length;
        int matchNo = 0;
        foreach (Match match in Regex.Matches(s, Regex.Escape(oldValue), RegexOptions.IgnoreCase))
        {
            sb.Remove(match.Index - (offset * matchNo), match.Length).Insert(match.Index - (offset * matchNo), newValue);
            matchNo++;
        }
        return sb.ToString();
    }

Regex.Replace(strInput, strToken.Replace("<*>quot;, "[$]"), strReplaceWith, RegexOptions.IgnoreCase);

正则表达式方法应该有效。然而，您还可以做的是小写数据库中的字符串，小写％变量％，然后从数据库中找到下部字符串中的位置和长度。请记住，字符串中的位置不会因为它的下限而改变。

然后使用反向循环（更容易，如果不这样做，则必须保持后续点移动到的位置的运行计数）从数据库中删除非下限的字符串％变量％by他们的位置和长度，并插入替换值。

（因为每个人都在为此拍摄）。这是我的版本（使用空检查，正确的输入和替换转义）**灵感来自互联网和其他版本：

using System;
using System.Text.RegularExpressions;

public static class MyExtensions {
    public static string ReplaceIgnoreCase(this string search, string find, string replace) {
        return Regex.Replace(search ?? "", Regex.Escape(find ?? ""), (replace ?? "").Replace("var result = "This is a test".ReplaceIgnoreCase("IS", "was");
quot;, "$"), RegexOptions.IgnoreCase);          
    }
}

用法：

<*>

让我说明一下，如果你愿意，你可以把我撕成碎片。

正则表达式不是这个问题的答案 - 相对而言，它太慢而且内存很难。

StringBuilder比字符串重整更好。

由于这将是一个补充 string.Replace 的扩展方法，我认为重要的是匹配它的工作方式 - 因此抛出相同参数问题的异常很重要，因为返回原始字符串if没有替换。

我相信拥有StringComparison参数并不是一个好主意。我确实尝试过，但最初由michael-liu提到的测试案例显示了一个问题： -

[TestCase("œ", "oe", "", StringComparison.InvariantCultureIgnoreCase, Result = "")]

虽然IndexOf匹配，但源字符串（1）中的匹配长度与oldValue.Length（2）之间存在不匹配。当oldValue.Length被添加到当前匹配位置并且我无法找到解决方法时，这表现为在其他一些解决方案中引入IndexOutOfRange。 Regex无论如何都无法匹配案例，所以我采用了实用的解决方案，只使用 StringComparison.OrdinalIgnoreCase 作为我的解决方案。

我的代码与其他答案类似，但我的想法是在找到创建 StringBuilder 的麻烦之前找一个匹配。如果没有找到，则避免潜在的大分配。然后代码成为 do {...} while 而不是而{...}

我已经针对其他Answers进行了一些广泛的测试，这种测试速度更快，使用的内存略少。

    public static string ReplaceCaseInsensitive(this string str, string oldValue, string newValue)
    {
        if (str == null) throw new ArgumentNullException(nameof(str));
        if (oldValue == null) throw new ArgumentNullException(nameof(oldValue));
        if (oldValue.Length == 0) throw new ArgumentException("String cannot be of zero length.", nameof(oldValue));

        var position = str.IndexOf(oldValue, 0, StringComparison.OrdinalIgnoreCase);
        if (position == -1) return str;

        var sb = new StringBuilder(str.Length);

        var lastPosition = 0;

        do
        {
            sb.Append(str, lastPosition, position - lastPosition);

            sb.Append(newValue);

        } while ((position = str.IndexOf(oldValue, lastPosition = position + oldValue.Length, StringComparison.OrdinalIgnoreCase)) != -1);

        sb.Append(str, lastPosition, str.Length - lastPosition);

        return sb.ToString();
    }

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow