Character inferno

https://stackoverflow.com/questions/15073797

11-03-2022
|

Domanda

I need some help. I have to read data from a file and store it into an Oracle db. I run into troubles when characters like 'à' or 'À' appear into data. For example, 'à' is read and become 'Ã ' into my application, so, when I try to save data into db, sometimes, the db complains about values too big about the field that are going to save into. I also tryied

Normalizer.normalize(row, Form.NFD).replaceAll("\\p{InCombiningDiacriticalMarks}+", "");

I payed attention about encoding too. I notice that if I run my application on data file, a Cp1252 file, on a Windows machine I got no errors. Sadly I got errors when I run the stuff on a Linux machine. I'm using java 6. TIA.

Soluzione

So, the default character encoding on your windows machine is probably windows-1252 (a superset of latin-1). That means that if you don't specify the charset when reading in the file, Java will default to your system default and get it right.

On your Linux machine, your default charset is probably UTF-8. That means that if you don't not explicitly specify a charset while reading a file, it will default to UTF-8 . . . which, in this case, is wrong.

You didn't post how you're reading in your file, but for example:

InputStreamReader isr = new InputStreamReader(file, "UTF-8");

This would create an input stream reader for reading a file formatted in UTF-8.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow