Any simple way in bash to handle escaped UTF8 int?

https://stackoverflow.com/questions/21979902

15-10-2022
|

Question

we have an internal utility with output like this:

$ ./my_cmd 
"abc\228\184\173\230\150\135ABC"

The escaped int (not oct) are from utf8 stream bytes, separated by slash, we can escape it using python

>>> ''.join(chr(int(c)) for  c in  r"\228\184\173\230\150\135".split('\\') if c).decode('utf8')
u'\u4e2d\u6587'
>>> print u'\u4e2d\u6587'
中文

My question is are there any convenient shell utilities which could escape it instead of python?

It will act like this

$ ./my_cmd 
"abc\228\184\173\230\150\135ABC"
$ ./my_cmd  | some_utility
abc中文ABC

I tried to study bash's printf and the /usr/bin/printf these seems can't handle it. Can can one come up with a cool and easy to remember perl or sed/awk hack?

No correct solution

OTHER TIPS

You can format the data for use with recode:

$ echo '"\228\184\173\230\150\135"' | tr -c '0-9' '\n' | recode -f d1..data; echo
中文
$

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow