从 C# 中的字节数组中删除尾随空值

https://stackoverflow.com/questions/240258

c#
bytearray

04-07-2019
|

题

好的，我正在将 dat 文件读入字节数组。由于某种原因，生成这些文件的人在文件末尾放置了大约半兆的无用空字节。有人知道快速修剪这些末端的方法吗？

第一个想法是从数组末尾开始向后迭代，直到找到空值以外的内容，然后将所有内容复制到该点，但我想知道是否有更好的方法。

回答一些问题：你确定0字节肯定在文件中，而不是文件读取代码中有错误吗？是的，我确信这一点。

你能确定删除所有尾随的 0 吗？是的。

文件的其余部分可以有任何 0 吗？是的，其他地方还可以有0，所以，不行，我不能从头开始，到第一个0就结束。

解决方案

鉴于现在已回答了额外的问题，听起来您基本上正在做正确的事情。特别是，您必须从最后一个 0 开始触摸文件的每个字节，以检查它是否只有 0。

现在，是否必须复制所有内容取决于您随后对数据执行的操作。

您也许可以记住索引并将其与数据或文件名一起保存。
您可以将数据复制到新的字节数组中
如果你想“修复”文件，你可以调用文件流.SetLength 截断文件

那个“你有读取截断点和文件末尾之间的每个字节”是关键部分。

其他提示

我同意乔恩。关键是你必须“触摸”从最后一个字节到第一个非零字节的每个字节。像这样：

byte[] foo;
// populate foo
int i = foo.Length - 1;
while(foo[i] == 0)
    --i;
// now foo[i] is the last non-zero byte
byte[] bar = new byte[i+1];
Array.Copy(foo, bar, i+1);

我很确定这个效果和你能够做到的一样高效。

@Factor Mystic，

我认为有一条最短的路：

var data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
var new_data = data.TakeWhile((v, index) => data.Skip(index).Any(w => w != 0x00)).ToArray();

这个怎么样：

[Test]
public void Test()
{
   var chars = new [] {'a', 'b', '\0', 'c', '\0', '\0'};

   File.WriteAllBytes("test.dat", Encoding.ASCII.GetBytes(chars));

   var content = File.ReadAllText("test.dat");

   Assert.AreEqual(6, content.Length); // includes the null bytes at the end

   content = content.Trim('\0');

   Assert.AreEqual(4, content.Length); // no more null bytes at the end
                                       // but still has the one in the middle
}

假设0 = null，这可能是你最好的选择......作为一个小调整，当你最终复制有用的数据时，你可能想要使用 Buffer.BlockCopy 。

测试一下：

    private byte[] trimByte(byte[] input)
    {
        if (input.Length > 1)
        {
            int byteCounter = input.Length - 1;
            while (input[byteCounter] == 0x00)
            {
                byteCounter--;
            }
            byte[] rv = new byte[(byteCounter + 1)];
            for (int byteCounter1 = 0; byteCounter1 < (byteCounter + 1); byteCounter1++)
            {
                rv[byteCounter1] = input[byteCounter1];
            }
            return rv;
        }

始终有LINQ答案

byte[] data = new byte[] { 0x01, 0x02, 0x00, 0x03, 0x04, 0x00, 0x00, 0x00, 0x00 };
bool data_found = false;
byte[] new_data = data.Reverse().SkipWhile(point =>
{
  if (data_found) return false;
  if (point == 0x00) return true; else { data_found = true; return false; }
}).Reverse().ToArray();

您可以只计算数组末尾的零数，并在稍后迭代数组时使用它而不是.Length。你可以随意封装它。重点是您不需要将其复制到新结构中。如果它们很大，那可能是值得的。

如果在文件中空字节可以是有效值，您知道文件中的最后一个字节不能为空。如果是这样，向后迭代并寻找第一个非空条目可能是最好的，如果没有，那么就没有办法告诉文件的实际结束位置。

如果您对数据格式有更多了解，例如，不能有超过两个字节（或某些类似约束）的空字节序列。然后你可以实际上对“转换点”进行二元搜索。这应该比线性搜索快得多（假设您可以读取整个文件）。

基本思想（使用我之前关于没有连续空字节的假设）将是：

var data = (byte array of file data...);
var index = data.length / 2;
var jmpsize = data.length/2;
while(true)
{
    jmpsize /= 2;//integer division
    if( jmpsize == 0) break;
    byte b1 = data[index];
    byte b2 = data[index + 1];
    if(b1 == 0 && b2 == 0) //too close to the end, go left
        index -=jmpsize;
    else
        index += jmpsize;
}

if(index == data.length - 1) return data.length;
byte b1 = data[index];
byte b2 = data[index + 1];
if(b2 == 0)
{
    if(b1 == 0) return index;
    else return index + 1;
}
else return index + 2;

在我的情况下，LINQ方法永远不会完成^）））使用字节数组的速度很慢！

伙计们，为什么不使用Array.Copy（）方法？

    /// <summary>
    /// Gets array of bytes from memory stream.
    /// </summary>
    /// <param name="stream">Memory stream.</param>
    public static byte[] GetAllBytes(this MemoryStream stream)
    {
        byte[] result = new byte[stream.Length];
        Array.Copy(stream.GetBuffer(), result, stream.Length);

        return result;
    }

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow