Question

This is C# related. We have a case where we need to copy the entire source stream into a destination stream except for the last 16 bytes.

EDIT: The streams can range upto 40GB, so can't do some static byte[] allocation (eg: .ToArray())

Looking at the MSDN documentation, it seems that we can reliably determine the end of stream only when the return value is 0. Return values between 0 and the requested size can imply bytes are "not currently available" (what does that really mean?)

Currently it copies every single byte as follows. inStream and outStream are generic - can be memory, disk or network streams (actually some more too).

public static void StreamCopy(Stream inStream, Stream outStream)
{
    var buffer = new byte[8*1024];
    var last16Bytes = new byte[16];
    int bytesRead;
    while ((bytesRead = inStream.Read(buffer, 0, buffer.Length)) > 0)
    {
        outStream.Write(buffer, 0, bytesRead);
    }
    // Issues:
    // 1. We already wrote the last 16 bytes into 
    //    outStream (possibly over the n/w)
    // 2. last16Bytes = ? (inStream may not necessarily support rewinding)
}

What is a reliable way to ensure all but the last 16 are copied? I can think of using Position and Length on the inStream but there is a gotcha on MSDN that says

If a class derived from Stream does not support seeking, calls to Length, SetLength, Position, and Seek throw a NotSupportedException. .

Was it helpful?

Solution 3

Use a circular buffer sounds great but there is no circular buffer class in .NET which means additional code anyways. I ended up with the following algorithm, a sort of map and copy - I think it's simple. The variable names are longer than usual for the sake of being self descriptive here.

This flows thru the buffers as

[outStream] <== [tailBuf] <== [mainBuf] <== [inStream]

public byte[] CopyStreamExtractLastBytes(Stream inStream, Stream outStream,
                                         int extractByteCount)
{
    //var mainBuf = new byte[1024*4]; // 4K buffer ok for network too
    var mainBuf = new byte[4651]; // nearby prime for testing

    int mainBufValidCount;
    var tailBuf = new byte[extractByteCount];
    int tailBufValidCount = 0;

    while ((mainBufValidCount = inStream.Read(mainBuf, 0, mainBuf.Length)) > 0)
    {
        // Map: how much of what (passthru/tail) lives where (MainBuf/tailBuf)
        // more than tail is passthru
        int totalPassthruCount = Math.Max(0, tailBufValidCount + 
                                    mainBufValidCount - extractByteCount);
        int tailBufPassthruCount = Math.Min(tailBufValidCount, totalPassthruCount);
        int tailBufTailCount = tailBufValidCount - tailBufPassthruCount;
        int mainBufPassthruCount = totalPassthruCount - tailBufPassthruCount;
        int mainBufResidualCount = mainBufValidCount - mainBufPassthruCount;

        // Copy: Passthru must be flushed per FIFO order (tailBuf then mainBuf)
        outStream.Write(tailBuf, 0, tailBufPassthruCount);
        outStream.Write(mainBuf, 0, mainBufPassthruCount);

        // Copy: Now reassemble/compact tail into tailBuf
        var tempResidualBuf = new byte[extractByteCount];
        Array.Copy(tailBuf, tailBufPassthruCount, tempResidualBuf, 0, 
                      tailBufTailCount);
        Array.Copy(mainBuf, mainBufPassthruCount, tempResidualBuf, 
                      tailBufTailCount, mainBufResidualCount);
        tailBufValidCount = tailBufTailCount + mainBufResidualCount;
        tailBuf = tempResidualBuf;
    }
    return tailBuf;
}

OTHER TIPS

  1. Read between 1 and n bytes from the input stream.1

  2. Append the bytes to a circular buffer.2

  3. Write the first max(0, b - 16) bytes from the circular buffer to the output stream, where b is the number of bytes in the circular buffer.

  4. Remove the bytes that you just have written from the circular buffer.

  5. Go to step 1.

1This is what the Read method does – if you call int n = Read(buffer, 0, 500); it will read between 1 and 500 bytes into buffer and return the number of bytes read. If Read returns 0, you have reached the end of the stream.

2For maximum performance, you can read the bytes directly from the input stream into the circular buffer. This is a bit tricky, because you have to deal with the wraparound within the array underlying the buffer.

The following solution is fast and tested. Hope it's useful. It uses the double buffering idea you already had in mind. EDIT: simplified loop removing the conditional that separated the first iteration from the rest.

public static void StreamCopy(Stream inStream, Stream outStream) {
     // Define the size of the chunk to copy during each iteration (1 KiB)
     const int blockSize = 1024;
     const int bytesToOmit = 16;

     const int buffSize = blockSize + bytesToOmit;

     // Generate working buffers
     byte[] buffer1 = new byte[buffSize];
     byte[] buffer2 = new byte[buffSize];

     // Initialize first iteration
     byte[] curBuffer = buffer1;
     byte[] prevBuffer = null;

     int bytesRead;

     // Attempt to fully fill the buffer
     bytesRead = inStream.Read(curBuffer, 0, buffSize);
     if( bytesRead == buffSize ) {
        // We succesfully retrieved a whole buffer, we will output
        // only [blockSize] bytes, to avoid writing to the last
        // bytes in the buffer in case the remaining 16 bytes happen to 
        // be the last ones
        outStream.Write(curBuffer, 0, blockSize);
     } else {
        // We couldn't retrieve the whole buffer
        int bytesToWrite = bytesRead - bytesToOmit;
        if( bytesToWrite > 0 ) {
           outStream.Write(curBuffer, 0, bytesToWrite);
        }
        // There's no more data to process
        return;
     }

     curBuffer = buffer2;
     prevBuffer = buffer1;

     while( true ) {
        // Attempt again to fully fill the buffer
        bytesRead = inStream.Read(curBuffer, 0, buffSize);
        if( bytesRead == buffSize ) {
           // We retrieved the whole buffer, output first the last 16 
           // bytes of the previous buffer, and output just [blockSize]
           // bytes from the current buffer
           outStream.Write(prevBuffer, blockSize, bytesToOmit);
           outStream.Write(curBuffer, 0, blockSize);
        } else {
           // We could not retrieve a complete buffer 
           if( bytesRead <= bytesToOmit ) {
              // The bytes to output come solely from the previous buffer
              outStream.Write(prevBuffer, blockSize, bytesRead);
           } else {
              // The bytes to output come from the previous buffer and
              // the current buffer
              outStream.Write(prevBuffer, blockSize, bytesToOmit);
              outStream.Write(curBuffer, 0, bytesRead - bytesToOmit);
           }
           break;
        }
        // swap buffers for next iteration
        byte[] swap = prevBuffer;
        prevBuffer = curBuffer;
        curBuffer = swap;
     }
  }

static void Assert(Stream inStream, Stream outStream) {
   // Routine that tests the copy worked as expected
         inStream.Seek(0, SeekOrigin.Begin);
         outStream.Seek(0, SeekOrigin.Begin);
         Debug.Assert(outStream.Length == Math.Max(inStream.Length - bytesToOmit, 0));
         for( int i = 0; i < outStream.Length; i++ ) {
            int byte1 = inStream.ReadByte();
            int byte2 = outStream.ReadByte();
            Debug.Assert(byte1 == byte2);
         }

      }

A much easier solution to code, yet slower since it would work at a byte level, would be to use an intermediate queue between the input stream and the output stream. The process would first read and enqueue 16 bytes from the input stream. Then it would iterate over the remaining input bytes, reading a single byte from the input stream, enqueuing it and then dequeuing a byte. The dequeued byte would be written to the output stream, until all bytes from the input stream are processed. The unwanted 16 bytes should linger in the intermediate queue.

Hope this helps!

=)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top