Here us an awk
awk '{gsub(/ NA/,"");for (i=4;i<NF;i++) $i=$i","}1' file
G1 A1 X 3, 6, 7, 7
G1 B1 X 7, 6
G1 C1 X 1, 3, 4
G2 D1 Y 1
G2 E1 Y 2
G2 F1 Y 0
G1 G1 Y 8, 7
G2 H1 X
Question
I have a file as shown bellow
G1 A1 X 3 6 NA 7 NA NA NA 7 NA NA
G1 B1 X NA NA NA NA NA NA NA NA 7 6
G1 C1 X NA 1 3 4 NA NA NA NA NA NA
G2 D1 Y NA NA NA 1 NA NA NA NA NA NA
G2 E1 Y 2 NA NA NA NA NA NA NA NA NA
G2 F1 Y NA NA NA NA 0 NA NA NA NA NA
G1 G1 Y NA NA NA NA NA 8 7 NA NA NA
G2 H1 X NA NA NA NA NA NA NA NA NA NA
Now I want to print those fields which are not equivalent to NA. Again I don't want to change the 1st three column format but after that I want to print remaining fields as separated by comma, output file I want as
G1 A1 X 3, 6, 7, 7
G1 B1 X 7, 6
G1 C1 X 1, 3, 4
G2 D1 Y 1
G2 E1 Y 2
G2 F1 Y 0
G1 G1 Y 8, 7
G2 H1 X
I tried doing
sed 's/NA//g' file| sed 's/\t/ /g'| sed 's/ \+/, /g'
But it changing the format of the first three column also. Can you suggest something for this.
Thanks
Solution
Here us an awk
awk '{gsub(/ NA/,"");for (i=4;i<NF;i++) $i=$i","}1' file
G1 A1 X 3, 6, 7, 7
G1 B1 X 7, 6
G1 C1 X 1, 3, 4
G2 D1 Y 1
G2 E1 Y 2
G2 F1 Y 0
G1 G1 Y 8, 7
G2 H1 X
OTHER TIPS
This is complicated enough that I would reach for perl.
perl -pe '
chomp; s/\s+/ /g; s/ $//; s/^ //;
my ($left, $right) = /(\S+ \S+ \S+) (.+)$/;
$right =~ s/\bNA\b//g;
$right =~ s/^ +//;
$right =~ s/ +$//;
$right =~ s/ +/, /g;
$_ = "$left $right";
s/ *$/\n/;
'
DTRT on your test case, someone cleverer and/or less sleepy than me might be able to golf it down a bit.
Here is another way using awk
:
awk '{gsub(/NA/,"");$1=$1;for(i=4;i<=NF;i++)$i=(i==NF?$i:$i",")}1' file
G1 A1 X 3, 6, 7, 7
G1 B1 X 7, 6
G1 C1 X 1, 3, 4
G2 D1 Y 1
G2 E1 Y 2
G2 F1 Y 0
G1 G1 Y 8, 7
G2 H1 X
or using GNU sed
:
sed -e 's/\bNA\b//g' -e 's/ \+/ /g' -e 's/ *$//' -e 's/ /, /4g' file
G1 A1 X 3, 6, 7, 7
G1 B1 X 7, 6
G1 C1 X 1, 3, 4
G2 D1 Y 1
G2 E1 Y 2
G2 F1 Y 0
G1 G1 Y 8, 7
G2 H1 X
sed ':a
s/\(\([^]\{1,\} \{1,\}\)\{2\}[^ ]\{1,\}\)\(.*\) NA/\1\3/
t a
' YourFile
normaly should also work with the generic class [:blank:] instead of space character (code below) but strangely failed on my system
sed ':a
s/\(\([^[:blank:]]\{1,\}[:blank:]\{1,\}\)\{2\}[^[:blank:]]\{1,\}\)\(.*\)[:blank:]NA/\1\3/
t a
' YourFile