This Powershell script will parse a collection of $files containing records of 5 fields, each having 10 characters, and output those records with the fields pipe-delimited.
#Create a regular expression to match the field widths and capture the data.
$regex = [regex]'(.{10})(.{10})(.{10})(.{10})(.{10})'
#create a filter to insert a pipe character between the captured groups.
filter PipeDelimit {$_ -replace $regex, '$1|$2|$3|$4|$5'}
#Pipe the records thorough the filter in batches of 1000
Get-Content $files -ReadCount 1000 | Pipedelimit
You'll need to modify the regex and filter to match your data. I suspect it will take considerably less than 20 minutes to chew through half a million of those records.
The -Readcount
will control memory usage by keeping only 1000 records at a time in the pipeline. They will be passed to the pipeline as an array, and the -replace
operator in the filter will delimit the entire array in one operation, without needing to foreach
through each record. The filter
is admittedly unusual, and could be replaced with foreach-object
, but the filter
is marginally faster and it adds up if you're doing lots of reps.