Regex for removing duplicates before comment

https://stackoverflow.com/questions/22918502

29-06-2023
|

Question

I've got a list of people and event dates they've attended for an organization that I'm trying to parse down to just a list of people who have attended events. I'm looking for a regex or script that would find duplicates before a comment, remove the entire duplicate line, and count the number of times they appear on this list. i.e.:

John #March 13, 2013
John #April 4, 2013
Mark #February 20, 2013
John #July 8, 2013

becomes

John #3
Mark #1

If this is too complicated I'd settle for just removing the duplicates without a count of the number of events they've attended.

Solution

This thing can be done using Perl one liner command:

perl -le 'while(<>){$h{$1}++ if m/^(\S+)#?/} print "$_ #$h{$_}" for keys %h' input.txt

This is reading the file line by line and saving the names into a Hash. After that it just prints the keys from the hash with a count.

OTHER TIPS

Another approach is to sed, sort, and uniq:

sed 's/ *#.*//' input.txt | sort | uniq -c

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow