Question

I have a large text file containing sequences such as

\u02BBUtthay\u0101n h\u01E3ng Ch\u0101t Khao Yai

However, they render exactly as above. How do I convert this so people just see UTF-8? I would prefer to process the files at the command line if possible.

Was it helpful?

Solution

use the printf command.

http://manpages.ubuntu.com/manpages/intrepid/man3/printf.3.html

you can wrap it in $() to use as a variable if needed, too.

For example,

echo $(printf '\u02BBUtthay\u0101n h\u01E3ng Ch\u0101t Khao Yai')

this outputs: ʻUtthayān hǣng Chāt Khao Yai

Hope that helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top