Question

Im abit of a noob to Powershell so please dont chastise me :-) So I've got some rather large log files (600mb) that I need to process, my script essentially strips out those lines that contain "Message Received" then tokenises those lines and outputs a few of the tokens to an output file.

The logic of the script is fine (although Im sure it could be more efficient) but the problem is that as I write lines to the output file and the file subseuqenly grows larger, the amount of memory that powershell utilises also increases to the point of memory exhaustion.

Can anyone suggest how I can stop this occuring? I thought about breaking up the log into a temporary file of only say 10mb then processing on the temp file instead?

Heres my code, any help you guys could give would be fantastic :-)

Get-Date | Add-Content -Path d:\scripting\logparser\testoutput.txt


$a = Get-Content D:\scripting\logparser\importsample.txt 


foreach($l in $a){
#$l | Select-String -Pattern "Message Received." | Add-Content -Path d:\scripting\logparser\testoutput.txt
if
    (($l | Select-String -Pattern "Message Received." -Quiet) -eq "True")

    {
    #Add-Content -Path d:\scripting\logparser\testoutput.txt -value $l
    $var1,$var2,$var3,$var4,$var5,$var6,$var7,$var8,$var9,$var10,$var11,$var12,$var13,$var14,$var15,$var16,$var17,$var18,$var19,$var20 = [regex]::split($l,'\s+')
    Add-Content -Path d:\scripting\logparser\testoutput.txt -value $var1" "$var2" "$var3" "$var4" "$var16" "$var18

    }
else
    {}
}   
Get-Date | Add-Content -Path d:\scripting\logparser\testoutput.txt
Was it helpful?

Solution

If you do everything in the pipe, only one object at a time (one line from the file in your case) needs to be in memory.

Get-Content $inputFile | Where-Object { $_ -match "Message Received" } |
  foreach-object -process {
  $fields = [regex]::split($_,'\s+') # An array is created
  Add-Content -path $outputFile -value [String]::Join(" ", $fields[0,1,2,3,15,17])
}

The $fields[0,1,2,3,15,17] creates an array of the given indices of $fields.

This could also be done in a single pipeline using an expression rather than a property name passed to Select-Object, but would be less clear.

OTHER TIPS

a working powershell example:

$csvFile = "c:\test.txt"

$file_reader = [System.IO.File]::OpenText($csvFile)
$row = "";
while(($row = $file_reader.ReadLine()) -ne $null) 
{ 
    # do something with '$row' 
    Write-Host row: $row
}
$file_reader.Close()

You're effectively storing the entire log file in memory instead of sequential accessing it bit by bit.

Assuming that you log file has some internal delimiter for each entry (maybe new line) you'd read in each entry at a time, not keeping more in memory than absolutely necessary.

You won't be able to rely on the built in PowerShell stuff because it's in affect stupid.

You'll have to apologize my code sample, my PowerShell is a bit rusty.

var $reader = Create-Object "System.IO.StreamReader" testoutput.txt
var $s = ""
while(($s = reader.ReadLine())!=null)
{ 
    // do something with '$s' 
    // which would contain individual log entries.
}
$reader.Close()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top