Domanda

Now I have some euc-jp encoded files that needed to be converted to utf-8 encoding. So I use the iconv command in bash:

iconv foo.c -f euc-jp -t utf-8 -o foo.c

But a problem occurs that it says:

/usr/bin/iconv: illegal input sequence at position 30211

and file is truncated to a certain size (32 ~ 33KB).

But what is confusing is if I use

iconv foo.c -f euc-jp -t utf-8               # output to STDOUT
iconv foo.c -f euc-jp -t utf-8 -o foo.c.utf8 # output to a new file

It works perfectly well.

So I guess this maybe has something to do with BUFFER, can someone please explain it to me?

È stato utile?

Soluzione 4

It's overwriting the file that it's also trying to read from. If the converted form is longer than the original, the output will catch up to the input, and then it will try to convert what it has already converted.

It's surprising that it works at all. Most programs truncate their output file before writing, so there wouldn't be anything for it to read from.

Altri suggerimenti

Reading from and Writing to the same file with no syncronization? No, that is not a good idea. The file would be messed up.

To do no harm to the data and generate no garbage, try this:

cp foo.c temp.input; iconv temp.input -f euc-jp -t utf-8 -o foo.c;rm temp.input;

It is not a good idea to use the same file for input and output. You can't be sure how the operating routine (in this case iconv) makes use of these files.

Writing to the file you are reading from causes problems? What a surprise!

Look if iconv has some command line to work inline, or else write to a tmp output file and copy back over the original once you're done.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top