質問

I would like to remove what follows the forth occurrence of the character ":" in any field contains it. See the example:

Input:

1 10975     A C    1/1:137,105:245:99:1007,102,0   0/1:219,27:248:20:222,0,20 
1 19938     T TA   ./.                             1/1:0,167:167:99:4432,422,0,12,12
12 20043112 C G    1/2:3,5,0:15:92                 2/2:3,15:20:8

Expected output:

1 10975     A C    1/1:137,105:245:99   0/1:219,27:248:20 
1 19938     T TA   ./.                  1/1:0,167:167:99
12 20043112 C G    1/2:3,5,0:15:92      2/2:3,15:20:8

So Basically any field that has ":", what follows its forth occurrence should be removed. Note that the third line nothing change because ":" appears three times only. I have tried and found a solution (not good) which didn't work only for the first line and not the secod as it has more commas ","

Incomplete Solution:

sed 's/:[0-9]*,[0-9]*,[0-9]*//g'

Thanks in advance

役に立ちましたか?

解決 3

On fields 5 through to the last field, this will remove the fourth occurrence of the regexp :[^:]+

< file.txt awk '{ for (i=5; i<=NF; i++) $i = gensub(/:[^:]+/, "", 4, $i) }1' | column -t

On fields 5 through to the last field, this will remove everything after the fourth :

< file awk '{ for (i=5; i<=NF; i++) $i = gensub(/((:[^:]+){3}).*/, "\\1", 1, $i) }1' | column -t

Explanation:

Upon re-reading your question, the second solution is probably what you're looking for. The first solution looks for a colon followed by one or more characters not a colon and removes them. The third argument to gensub() describes which match of the regexp to replace. So a 4 tells gensub() to remove the fourth match of the pattern. The second solution, looks for three sets of the regexp described in the first answer. At this point it's worth mentioning that gensub() provides an additional feature that is not available using sub() or gsub(). This is the ability to specify components of a regexp in the replacement text, much like how other languages use parentheses to perform capturing. gensub() is a very powerful command only available using GNU awk. The description and example provided here are very useful. HTH.

Results:

1   10975     A  C   1/1:137,105:245:99  0/1:219,27:248:20
1   19938     T  TA  ./.                 1/1:0,167:167:99
12  20043112  C  G   1/2:3,5,0:15:92     2/2:3,15:20:8

他のヒント

Sed:

sed -r 's/((:[^: \t]*){3}):[^ \t]*/\1/g' file | column -t

Perl:

perl -pe 's/((:\S*){3}):\S*/$1/g' file | column -t

Using sed

sed -r 's/((:[^ ]*){3}):[^ ]*/\1/g' file

Output:

1 10975     A C    1/1:137,105:245:99   0/1:219,27:248:20 
1 19938     T TA   ./.                             1/1:0,167:167:99
12 20043112 C G    1/2:3,5,0:15:92                 2/2:3,15:20:8

Using perl

perl -pe 's/((:\S*){3}):\S*/$1/g' file
perl -lane 's/(.*?:.*?:.*?:.*?):.*/$1/g  for @F ; printf "@F"."\n"' your_file
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top