String.Join 对比字符串生成器：哪个更快？

https://stackoverflow.com/questions/585860

06-09-2019
|

题

在一个上一个问题关于格式化 double[][] 为 CSV 格式，有人建议那个使用 StringBuilder 会比 String.Join. 。这是真的？

解决方案

短的答案：这取决于

龙答：如果你已经有一个字符串数组一起串联（带分隔符），String.Join是这样做的最快方法

。

String.Join可以浏览所有的字符串制定出它所需要的精确长度，然后再去和复制所有数据。这意味着会有的没有的额外拷贝参与。在仅缺点是，它要经过串两次，这意味着潜在的吹存储器高速缓存的次数超过必要的。

如果您的不的有字符串作为一个数组事前，它的可能的更快地使用StringBuilder - 但会有情况下它不是。如果使用StringBuilder装置做很多很多拷贝，则建立一个数组，然后调用String.Join很可能更快。

编辑：这是在一个单一的呼叫的方面VS String.Join一堆调用StringBuilder.Append的。在原来的问题，我们有两个不同层次的String.Join通话，所以每一个嵌套调用会创建一个中间字符串。换句话说，它更复杂，更难猜测。我惊讶地看到与典型的数据任意方式“赢”显著（在复杂性方面）。

编辑：当我在家的时候，我就写了一个基准，这是一样痛苦为可能的StringBuilder。基本上，如果你有一个数组，其中每个元素都是前一个约两倍，而你得到它刚刚好，你应该能够迫使副本每追加（元素，而不是分隔符，虽然需要考虑到太）。在这一点上，它几乎是简单的字符串连接那样糟糕 - 但String.Join不会有任何问题。

。

其他提示

这是我的试验台，使用为简单起见int[][];结果第一：

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

（更新double结果：）

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

（更新重新2048 * 64 * 150）

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

和启用OptimizeForTesting：

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

所以更快，但没有大量的左右;钻机（在控制台运行，在释放模式等）：

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}

我不这么认为。通过反射展望，String.Join实行看起来很优化。它也有知道字符串的总尺寸的增加的益处要预先创建的，所以它不需要任何重新分配。

我已经创建了两个试验方法对它们进行比较：

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

我跑的每个方法的50倍，传递大小[2048][64]的阵列。我没有这两个数组;一个充满了零和另一个充满随机值。我在我的机器，结果如下（P4 3.0 GHz的单核，无HT，运行从CMD释放模式）：

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

增加阵列[2048][512]的大小，同时降低迭代次数至10拉去以下结果：

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

的结果是可重复的（几乎;引起的不同的随机的值小的波动）。显然String.Join是快一点的大部分时间（虽然由一个非常小的裕度）。

这是我用于测试的代码：

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}

除非1％的差变成在整个程序需要运行时间方面的东西显著，这看起来像微优化。我写的是代码最可读的/理解，而不是担心1％的性能差异。

阿特伍德了后一种与此相关的大约一个月前：

http://www.codinghorror.com/blog/archives/001218.html

是的。如果您执行多次连接，则会很多快点。

当您执行 string.join 时，运行时必须：

为结果字符串分配内存
将第一个字符串的内容复制到输出字符串的开头
将第二个字符串的内容复制到输出字符串的末尾。

如果进行两次联接，则必须复制数据两次，依此类推。

StringBuilder 分配一个缓冲区以供备用，因此可以附加数据而无需复制原始字符串。由于缓冲区中还有剩余空间，因此可以将附加的字符串直接写入缓冲区中。然后它只需要在最后复制整个字符串一次。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow