Question

I have written the following code which parse a binary file

Param(
    [Parameter(Mandatory=$True)]
    [string]$inputFilePath
)

function GetLengthFrom2Byte
{
    Param(
        [Parameter(Mandatory=$True)]
        [byte[]]$bytes
    )

    if ($bytes.Length -ne 2)
    {
        echo "Parametro di Input non valido"    
    }
    else
    {
        $lenByte = New-Object Byte[](2)

        $lenByte[0] = $bytes[1]
        $lenByte[1] = $bytes[0]

        return [BitConverter]::ToUInt16($lenByte, 0)
    }
}

try
{
    if (!(Test-Path($inputFilePath)))
    {
        Throw "File di Input non valido: <{0}>" -f $inputFilePath    
    }

    $inputStream = New-Object IO.FileStream($inputFilePath, [IO.FileMode]::Open,  [IO.FileAccess]::Read, [IO.FileShare]::Read)
    $inputBinaryReader = New-Object IO.BinaryReader($inputStream)

    while ($inputBinaryReader.PeekChar() -ne -1)
    {
        $AfpHeader = $inputBinaryReader.ReadByte()

        if ($AfpHeader -ne 0x5A)
        {
            Throw "Errore nella struttura AFP. Byte 0x5A non trovato all' offset: <{0}>" -f $inputBinaryReader.BaseStream.Position
            exit 8
        }

        $AfpLength = $inputBinaryReader.ReadBytes(2)

        $recordLength = GetLengthFrom2Byte($AfpLength)

        $inputBinaryReader.ReadBytes($recordLength - 2) > $null
    }

    echo "File AFP Validato"
}
catch [Exception]
{
    echo "Errore: {0}" -f $error[0]
    exit 8
}
finally
{
    $inputBinaryReader.Dispose()
    $inputStream.Dispose()
}

exit 0

I don't want to go into the details of the Binary Parsing. Problem is that the same function in C# took ~50secs, while in Powershell it took 11mins.

Since i am using the same classes, I don't know why the gap is so huge. Is there any way to improve powershell performance ?

Was it helpful?

Solution

You will get the biggest performance boost if you move this line to script scope:

    $lenByte = New-Object Byte[](2)

so that it happens just once. Then you should change the references to be '$script:lenByte'.

You'll get another boost to performance if you skip creating the local function GetLengthFrom2Byte and instead just inline that bit of script.

After those 2 changes, I think performance should be more in line with C#.

Starting with PowerShell V3, scripts and loops do get compiled to native code after executing 16 times, so it is possible to achieve near C# like performance, but you must avoid certain PowerShell features which in many cases is impossible to do.

Calling a PowerShell cmdlet is like New-Object is fairly expensive because PowerShell does the following every time you execute it:

  • Searches for the command every time you execute it. For New-Object, the search looks in multiple scopes for multiple different command types before finding the cmdlet.
  • Bind the parameters (which uses reflection)
  • Convert the string parameter to a System.Type instance
  • Determine which constructor to call
  • Call the constructor via reflection

Note that all of the above can be made significantly faster, it just hasn't happened yet.

PowerShell must do similar things for your local function GetLengthFrom2Byte, though it's slightly better in that fewer scopes need to be searched for the command. The parameter still requires a conversion, but converting an array of 2 elements is probably much faster than the conversion from string to type.

Note that even after using pure .Net style code (no PowerShell functions being used in your inner loop), performance will still be worse then C# because of the dynamic nature of PowerShell. For example, every time you access a property or call a .Net method, PowerShell must check if the type of the target object and the types of the method arguments. In V3, these operations are significantly faster than V2, but there are still dynamic checks that must happen to ensure PowerShell language semantics.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top