Pergunta

I need to test the integrity of file before importing to SQL. Each row of the file should have the exact same amount of columns.

These are "|" delimited files. I also need to ignore the first line as it is garbage.

If every row does not have the same number of columns, then I need to write an error message.

I have tried using something like the following with no luck:

$colCnt = "c:\datafeeds\filetoimport.txt"
$file = (Get-Content $colCnt -Delimiter "|") 
$file = $file[1..($file.count - 1)]
Foreach($row in $file){
    $row.Count
}

Counting rows is easy. Columns is not. Any suggestions?

Foi útil?

Solução

Yep, read the file skipping the first line. For each line split it on the pipe, and count the results. If it isn't the same as the previous throw an error and stops.

$colCnt = "c:\datafeeds\filetoimport.txt"
[int]$LastSplitCount = $Null
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if($LastSplitCount -and !($_.split("|").Count -eq $LastSplitCount)){"Process stopped at line number $($_.psobject.Properties.value[5]) for column count mis-match.";break}elseif(!$LastSplitCount){$LastSplitCount = $_.split("|").Count}}

That should do it, and if it finds a bad column count it will stop and output something like:

Process stopped at line number 5 for column count mis-match.

Edit: Added a Where catch to skip blank lines ( ?{$_} )
Edit2: Ok, if you know what the column count should be then this is even easier.

Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}

If you want it to return all lines that don't have 210 columns just remove the ;break and let it run.

Outras dicas

A more generic approach, including a RegEx filter:

$path = "path\to\folder"
$regex = "regex"
$expValue = 450

$files= Get-ChildItem $path | Where-Object {$_.Name -match $regex}
Foreach( $f in $files) {
    $filename = $f.Name
    echo $filename
    $a = Get-Content $f.FullName;
    $i = 1;
    $e = 0;
    echo "Starting...";
    foreach($line in $a)
    {
        if ($line.length -ne $expValue){
            echo $filename
            $a | Measure-Object -Line
            echo "Long:"
            echo $line.Length;
            echo "Line Nº: "
            echo $i;
            $e = $e + 1;       
        }
        $i = $i+1;
    }
    echo "Finished";
    if ($e -ne 0){
        echo $e "errors found";
    }else{
        echo "No errors"
        echo ""
    }
}
echo "All files examined"

Another possibility:

$colCnt = "c:\datafeeds\filetoimport.txt"

$DataLine = (Get-Content $colCnt -TotalCount 2)[1]
$DelimCount = ([char[]]$DataLine -eq '|').count
$MatchString = '.*' + ('|.*' * $DelimCount )

$test = Select-String -Path $colCnt -Pattern $MatchString -NotMatch |
  where { $_.linenumber -ne 1 }

That will find the number of delimiter characters in the second line, and build a regex pattern that can be used with Select-String.

The -NotMatch switch will make it return any lines that don't match that pattern as MatchInfo objects that will have the filename, line number and content of the problem lines.

Edit: Since the first line is "garbage" you probably don't care if it didn't match so I added a filter to the result to drop that out.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top