Your locale is UTF-8, but the byte sequence you are echoing is not valid UTF-8, because of \0200a
and \0377a
. If you use set LC_ALL=en_US.ISO8859-1
(iso-latin-1), then it works fine, because the result of echo
is a valid iso-latin-1 string.
sed (on osx Snow Leopard) with BRE '.' doesn't match character > ascii 127
Вопрос
I'm running sed on Mac Os X Snow Leopard.
sed is/should be BSD sed (man page is dated 2005-05-10) man page states:
The sed utility is expected to be a superset of
the IEEE Std 1003.2 (``POSIX.2'') specification.
When I'm trying a replacement, and the input stream holds characters larger than ascii 127, the dot does not match this character.
e.g.
echo -e "a001\0001a - a127\0177a - a128\0200a - a255\0377a - a061\0075a" \
| sed -e 's/a[0-9]\{3\}.a/match/g;' ;
echo "result: $?";
results in output:
match - match - a128?a - a255?a - match
result: 0
On Os X Maverick (stating the same manual page), the result gives an error:
sed: RE error: illegal byte sequence
result: 1
On a linux Mint 13 system, the same instruction returns (my expectation):
match - match - match - match - match
result: 0
according to http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_03 the '.' should match
"any character in the supported character set except NUL".
If running this similar command (gsed version 4.2.1 on Os X Snow Leopard):
echo -e "a001\0001a - a127\0177a - a128\0200a - a255\0377a - a061\0075a"\
| gsed -e 's/a[0-9]\{3\}.a/match/g;';
echo "result: $?";
I get the same (for me unexpected) result:
match - match - a128?a - a255?a - match
result: 0
- anybody else has the same behaviour ?
can explain why ? (is it a bug in BSD ??) and/or how to circumvent/fix ? I can only guess it is related to the "
supported character set
" which would then be different on the different systems.... Especially since on the SL-system both BSD sed and GNU sed behave the same. I did however already check and alter my env: On the SL system:$> env | grep '^L' LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE=UTF-8
And on the Mint system:
$user@Mint > env | grep '^L' LANG=en_US.UTF-8 LANGUAGE=en_US:en LC_CTYPE=UTF-8
Решение