Question

I am running the following Powershell script to concatenate a series of output files into a single CSV file. whidataXX.htm (where xx is a two digit sequential number) and the number of files created varies from run to run.

$metadataPath = "\\ServerPath\foo" 

function concatenateMetadata {
    $cFile = $metadataPath + "whiconcat.csv"
    Clear-Content $cFile
    $metadataFiles = gci $metadataPath
    $iterations = $metadataFiles.Count
    for ($i=0;$i -le $iterations-1;$i++) {
        $iFile = "whidata"+$i+".htm"
        $FileExists = (Test-Path $metadataPath$iFile -PathType Leaf)
        if (!($FileExists))
        {
            break
        }
        elseif ($FileExists)
        {
            Write-Host "Adding " $metadataPath$iFile
            Get-Content $metadataPath$iFile | Out-File $cFile -append
            Write-Host "to" $cfile
        }
    }
} 

The whidataXX.htm files are encoded UTF8, but my output file is encoded UTF16. When I view the file in Notepad, it appears correct, but when I view it in a Hex Editor, the Hex value 00 appears between each character, and when I pull the file into a Java program for processing, the file prints to the console with extra spaces between c h a r a c t e r s.

First, is this normal for PowerShell? or is there something in the source files that would cause this?

Second, how would I fix this encoding problem in the code noted above?

Was it helpful?

Solution

The Out-* cmdlets (like Out-File) format the data, and the default format is unicode.

You can add an -Encoding parameter to Out-file:

Get-Content $metadataPath$iFile | Out-File $cFile -Encoding UTF8 -append

or switch to Add-Content, which doesn't re-format

Get-Content $metadataPath$iFile | Add-Content $cFile 

OTHER TIPS

First, the fact that you get 2 bytes per character indicates that fixed length UTF16 is being used. More accurately, it is called UCS-2. This article explains that file redirection in Powershell causes the output to be in UCS-2. See http://www.kongsli.net/nblog/2012/04/20/powershell-gotchas-redirect-to-file-encodes-in-unicode/. That same article also provides a fix.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top