Question

I'm a relatively new Powershell user, and have what I thought was a simple question. I have spent a bit of time looking for similar scenarios and surprisingly haven't found any. I would post my failed attempts, but I can't even get close!

I have a log file with repetitive data, and I want to extract the latest event for each "unique" entry. The problem lies in the fact that each entry is unique due to the individual date stamp. The "unique" criteria is in Column 1. Example:

AE0440,1,2,3,30/08/2012,12:00:01,XXX
AE0441,1,2,4,30/08/2012,12:02:01,XXX
AE0442,1,2,4,30/08/2012,12:03:01,XXX
AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0441,1,2,4,30/08/2012,12:06:01,XXX
AE0442,1,2,4,30/08/2012,12:08:01,XXX
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ

Therefore the output I want would be (order not relevant):

AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0442,1,2,4,30/08/2012,12:08:01,XXX
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ

How can I get this data/discard old data?

Was it helpful?

Solution

Try this, it may look a bit cryptic for first time user. It reads the content of the file, groups the lines by the unique value (now we have 3 groups), each group is sorted by parsing the date time value (again by splitting) and the first value is returned.

Get-Content .\log.txt | Group-Object { $_.Split(',')[0] } | ForEach-Object {    
    $_.Group | Sort-Object -Descending { [DateTime]::ParseExact(($_.Split(',')[-3,-2] -join ' '),'dd/MM/yyyy HH:mm:ss',$null) } | Select-Object -First 1    
}

AE0440,1,2,4,30/08/2012,12:04:01,YYY
AE0441,1,2,5,30/08/2012,12:10:01,ZZZ
AE0442,1,2,4,30/08/2012,12:08:01,XXX

OTHER TIPS

Assuming your data looks exactly like your example:

# you can give more meaningful names to the columns if you want. just make sure the number of columns matches
$data = import-csv .\data.txt -Header Col1,Col2,Col3,Col4,Col5,Col6,Col7

# sort all data by the timestamp, then group by the label in column 1
$grouped = $data | sort {[DateTime]::ParseExact("$($_.Col6) $($_.Col5)", 'HH:mm:ss dd/MM/yyyy', $Null)} -Desc | group Col1

# read off the first element of each group (element with latest timestamp)
$grouped |%{ $_.Group[0] }

This also assumes your timestamps are on a 24-hr clock. i.e. all of your sample data is close to 12 noon, not 12 midnight. One second after midnight would need to be specified '00:00:01'

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top