On fields 5 through to the last field, this will remove the fourth occurrence of the regexp :[^:]+
< file.txt awk '{ for (i=5; i<=NF; i++) $i = gensub(/:[^:]+/, "", 4, $i) }1' | column -t
On fields 5 through to the last field, this will remove everything after the fourth :
< file awk '{ for (i=5; i<=NF; i++) $i = gensub(/((:[^:]+){3}).*/, "\\1", 1, $i) }1' | column -t
Explanation:
Upon re-reading your question, the second solution is probably what you're looking for. The first solution looks for a colon followed by one or more characters not a colon and removes them. The third argument to gensub()
describes which match of the regexp to replace. So a 4 tells gensub()
to remove the fourth match of the pattern. The second solution, looks for three sets of the regexp described in the first answer. At this point it's worth mentioning that gensub()
provides an additional feature that is not available using sub()
or gsub()
. This is the ability to specify components of a regexp in the replacement text, much like how other languages use parentheses to perform capturing. gensub()
is a very powerful command only available using GNU awk. The description and example provided here are very useful. HTH.
Results:
1 10975 A C 1/1:137,105:245:99 0/1:219,27:248:20
1 19938 T TA ./. 1/1:0,167:167:99
12 20043112 C G 1/2:3,5,0:15:92 2/2:3,15:20:8