I have the following lines:

123 abcd 456 xyz
123 abcd 678 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty
234 egfs 422 ert 33

So here, if the first field is same for multiple lines, they are considered duplicate. so, in the above example 123 is same in 2 lines, they are considered duplicates (though they differ in one field in the middle). Similarly, lines with 234 are duplicates. I need to remove these duplicate lines.

Since they aren't 100% duplicates, sort u doesn't work. Does anyone know how i can delete these duplicate lines?

有帮助吗?

解决方案

this would be a very easy task for awk, I would do it with awk. In vim, you can do:

% !awk '\!a[$1]++'

then you got:

123 abcd 456 xyz
234 egfs 434 ert
345 fggfgf 456 455 rty

if you do it in shell, you don't have to escape the !:

awk '!a[$1]++' file

其他提示

g/\%(^\1\>.*$\n\)\@<=\(\k\+\).*$/d

This is easy with my PatternsOnText plugin. It allows to specify a pattern that is ignored for the duplicate check; in your case, that would be everything after the first (space-delimited) field:

%DeleteDuplicateLinesIgnoring / .*/
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top