MySQL latin1 to UTF-8 using Java & Hibernate & JPA

https://stackoverflow.com/questions/21689665

09-10-2022
|

Question

I have a database that is in charset=latin1 and collation latin1_swedish_ci. The user has entered UTF-8 characters in this connection ( Greek characters ) and in PHP he can read everything just fine.

But when I try to read the database using JAVA + JPA + Hibernate ( all Latest versions ) the characters are totally broken.

Note that I have already tried my jdbc string with:

...?useUnicode=true&amp;characterEncoding=latin1&amp;connectionCollation=latin1_swedish_ci
...?useUnicode=true&amp;characterEncoding=UTF-8&amp;connectionCollation=utf8_general_ci
...?characterSetResults=ISO8859_1
...and combinations of those

but still I can't read the characters.

The best I have managed is to use:

byte ptext[] = myString.getBytes(windows-1252); 
String fixed = new String(ptext, UTF_8);

with:

?useUnicode=true&amp;characterEncoding=UTF-8&amp;connectionCollation=utf8_general_ci

But still there are many characters that in eclipse's output are "?" and in log4j output everything is broken.

Any suggestions?

Solution 2

Solution:

SELECT CONVERT(CONVERT(CONVERT( column_name USING latin1) USING binary) using utf8) FROM...

But it ties you down to Native SQL. You cannot use JPA Queries. There is no other way. Only MySQL knows how to convert what it converted when the data was entered in the database.

OTHER TIPS

MySQL's version of latin1 is an extended version of CP1252: it uses the 5 bytes that CP1252 leaves undefined. Unfortunately the current Connector/J has a "bug" in that it uses the original CP1252 rather than MySQL's own version. Therefore it's impossible to recover strings whose encoding uses one of these 5 bytes. Patching the Connector/J source to fix the bug could solve the problem, but ideally you should migrate the tables to UTF-8.

A workaround is using the the JDBC getBytes method instead of getString to get data from the result set, this way going around the broken encoding handling in the client library:

String recovered = new String(resultSet.getBytes(1), "UTF-8");

I'm not sure if this can help you because with JPA and Hibernate you are quite removed from the JDBC API.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow