Question

How can I delete multiple headers from a file? I tried to use the below code after finding it from How can I delete duplicate lines in a file in Unix?.

awk '!x[$0]++' file.txt

It is deleting all the duplicate records in the file. But in my case, I just need the header duplicates to be removed, not the duplicate records in the file. For example, I have a file with the below data:

column1, column2, column3, column4, column5
value11, value12, value13, value14, value14
value21, value22, value23, value24, value25
value31, value32, value33, value34, value35
value41, value42, value43, value44, value45
value51, value52, value53, value54, value55
value21, value22, value23, value24, value25
column1, column2, column3, column4, column5
value11, value12, value13, value14, value14
value21, value22, value23, value24, value25
column1, column2, column3, column4, column5
column1, column2, column3, column4, column5

I am expecting the output as below:

column1, column2, column3, column4, column5
value11, value12, value13, value14, value14
value21, value22, value23, value24, value25
value31, value32, value33, value34, value35
value41, value42, value43, value44, value45
value51, value52, value53, value54, value55
value21, value22, value23, value24, value25
value11, value12, value13, value14, value14
value21, value22, value23, value24, value25
Was it helpful?

Solution

If you know that the first line contains the header, just delete all other instances of that.

awk 'FNR==1 { header = $0; print }
     $0 != header' file

If that won't work, please tell us how we can identify a header line. If it's just a static string, grep -vF 'that string' or if it matches a particular regex, grep -v 'that regex'.

OTHER TIPS

This might work for you (GNU sed):

sed -r '1h;1!G;/^(.*)\n\1/d;P;D' file
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top