Question

I have the following lines:

123 abcd 456 xyz
123 abcd 678 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty
234 egfs 422 ert 33

So here, if the first field is same for multiple lines, they are considered duplicate. so, in the above example 123 is same in 2 lines, they are considered duplicates (though they differ in one field in the middle). Similarly, lines with 234 are duplicates. I need to remove these duplicate lines.

Since they aren't 100% duplicates, sort u doesn't work. Does anyone know how i can delete these duplicate lines?

Was it helpful?

Solution

this would be a very easy task for awk, I would do it with awk. In vim, you can do:

% !awk '\!a[$1]++'

then you got:

123 abcd 456 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty

if you do it in shell, you don't have to escape the !:

awk '!a[$1]++' file

OTHER TIPS

g/\%(^\1\>.*$\n\)\@<=\(\k\+\).*$/d

This is easy with my PatternsOnText plugin. It allows to specify a pattern that is ignored for the duplicate check; in your case, that would be everything after the first (space-delimited) field:

%DeleteDuplicateLinesIgnoring / .*/
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top