You need to use extended regex switch in sed:
sed -r 's/(TAG|TAA|TGA)$//'
OR on OSX:
sed -E 's/(TAG|TAA|TGA)$//'
Or this sed without extended regex (doesn't work on OSX though):
sed 's/\(TAG\|TAA\|TGA\)$//'
题
I am trying to replace three letter code at the end of a sequence with nothing (basically removing) with sed
but is not working well for multiple regex pattern. Here is an example of sequences
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA
GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTGA
When I try to use regex
individually with sed
it works
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAA" | sed 's/TAA$//'
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" | sed 's/TAG$//'
However when I try to include multiple regex it doesn't work
echo "GCAAAAAGTTGTATAGTCACACAACCTAGACTTATATCGTCTGCTATTCATTAG" |
sed 's/(TAG$|TAA$|TGA$)//'
Could somebody point to me where I am doing wrong?
解决方案
You need to use extended regex switch in sed:
sed -r 's/(TAG|TAA|TGA)$//'
OR on OSX:
sed -E 's/(TAG|TAA|TGA)$//'
Or this sed without extended regex (doesn't work on OSX though):
sed 's/\(TAG\|TAA\|TGA\)$//'
其他提示
You need to escape the RE metacharacters |
and parens.
sed 's/\(TAG$\|TAA$\|TGA$\)//'
or you can use the portable option -E
to prevent escaping. -E
enable extended regular expressions, so your original command will run without any issues.
for non GNU sed (or with --posix
option) where |
is not available
If TGG is not occuring or could be included
sed 's/T[AG][AG]$//' YourFile
if not
sed 's/T[AG]A$//;s/TAA$//' YourFile
By default, sed
uses Basic Regular Expressions, which requires escaping parentheses and pipes:
sed 's/\(TAG\|TAA\|TGA\)$//'
Recent versions of sed
also support the -r
option to use Extended Regular Expressions:
sed -r 's/(TAG|TAA|TGA)$//'
I don't think this will be that helpful for you, but if you want to remove just the last 3 characters regardless:
sed 's/...$//'
awk
can also be used if you like to try some other solution:
awk '{sub(/(TAG|TAA|TGA)$/,"")}1' file