Question

I'm profiling some C# code. The method below is one of the most expensive ones. For the purpose of this question, assume that micro-optimization is the right thing to do. Is there an approach to improve performance of this method?

Changing the input parameter to p to ulong[] would create a macro inefficiency.

static ulong Fetch64(byte[] p, int ofs = 0)
{
    unchecked
    {
        ulong result = p[0 + ofs] + 
            ((ulong) p[1 + ofs] <<  8) + 
            ((ulong) p[2 + ofs] << 16) + 
            ((ulong) p[3 + ofs] << 24) + 
            ((ulong) p[4 + ofs] << 32) + 
            ((ulong) p[5 + ofs] << 40) + 
            ((ulong) p[6 + ofs] << 48) + 
            ((ulong) p[7 + ofs] << 56);
        return result;
    }
}
Was it helpful?

Solution

Why not use BitConverter? I've got to believe the Microsoft has spent some time tuning that code. Plus it deals with endian issues.

Here's how BitConverter turns a byte[] into a long/ulong (ulong converts it as signed and then casts it to unsigned):

[SecuritySafeCritical]
public static unsafe long ToInt64(byte[] value, int startIndex)
{
  if (value == null)
  {
    ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
  }
  if (((ulong) startIndex) >= value.Length)
  {
    ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index);
  }
  if (startIndex > (value.Length - 8))
  {
    ThrowHelper.ThrowArgumentException(ExceptionResource.Arg_ArrayPlusOffTooSmall);
  }
  fixed (byte* numRef = &(value[startIndex]))
  {
    if ((startIndex % 8) == 0)
    {
      return *(((long*) numRef));
    }
    if (IsLittleEndian)
    {
      int num  = ((numRef[0] | (numRef[1] << 8)) | (numRef[2] << 0x10)) | (numRef[3] << 0x18);
      int num2 = ((numRef[4] | (numRef[5] << 8)) | (numRef[6] << 0x10)) | (numRef[7] << 0x18);
      return (((long) ((ulong) num)) | (num2 << 0x20));
    }
    int num3 = (((numRef[0] << 0x18) | (numRef[1] << 0x10)) | (numRef[2] << 8)) | numRef[3];
    int num4 = (((numRef[4] << 0x18) | (numRef[5] << 0x10)) | (numRef[6] << 8)) | numRef[7];
    return (((long) ((ulong) num4)) | (num3 << 0x20));
  }
}

I suspect that doing the conversion one 32-bit word at a time is for 32-bit efficiency. No 64-bit registers on a 32-bit CPU means dealing with a 64-bit ints is a lot more expensive.

If you know for sure you're targeting 64-bit hardware, it might be faster to do do the conversion in one fell swoop.

OTHER TIPS

Try to use for instead of unrolling the loop. You may be able to save time on boundary checks.

Try BitConverter.ToUInt64 - http://msdn.microsoft.com/en-us/library/system.bitconverter.touint64.aspx if it is what you looking for.

For reference, Microsoft's .NET 4.0 BitConverter.ToInt64 (Shared Source Initiative at http://referencesource.microsoft.com/netframework.aspx):

    // Converts an array of bytes into a long.
    [System.Security.SecuritySafeCritical]  // auto-generated 
    public static unsafe long ToInt64 (byte[] value, int startIndex) {
        if( value == null)  {
            ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
        } 

        if ((uint) startIndex >= value.Length) { 
            ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index); 
        }

        if (startIndex > value.Length -8) {
            ThrowHelper.ThrowArgumentException(ExceptionResource.Arg_ArrayPlusOffTooSmall);
        }

        fixed( byte * pbyte = &value[startIndex]) {
            if( startIndex % 8 == 0) { // data is aligned 
                return *((long *) pbyte); 
            }
            else { 
                if( IsLittleEndian) {
                    int i1 = (*pbyte) | (*(pbyte + 1) << 8)  | (*(pbyte + 2) << 16) | (*(pbyte + 3) << 24);
                    int i2  = (*(pbyte+4)) | (*(pbyte + 5) << 8)  | (*(pbyte + 6) << 16) | (*(pbyte + 7) << 24);
                    return (uint)i1 | ((long)i2 << 32); 
                }
                else { 
                    int i1 = (*pbyte << 24) | (*(pbyte + 1) << 16)  | (*(pbyte + 2) << 8) | (*(pbyte + 3)); 
                    int i2  = (*(pbyte+4) << 24) | (*(pbyte + 5) << 16)  | (*(pbyte + 6) << 8) | (*(pbyte + 7));
                    return (uint)i2 | ((long)i1 << 32); 
                }
            }
        }
    } 

Why not go unsafe?

unsafe static ulong Fetch64(byte[] p, int ofs = 0)
{
  fixed (byte* bp = p)
  {
    return *((ulong*)(bp + ofs));
  }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top