Java cannot retrieve Unicode (Lithuanian) letters from Access via JDBC-ODBC

https://stackoverflow.com/questions/7222475

15-01-2021
|

Domanda

i have DB where some names are written with Lithuanian letters, but when I try to get them using java it ignores Lithuanian letters

    DbConnection();
    zadanie=connect.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE,ResultSet.CONCUR_UPDATABLE);
    sql="SELECT * FROM Clients;";   
    dane=zadanie.executeQuery(sql);

    String kas="Imonė";
    while(dane.next())
    {
         String var=dane.getString("Pavadinimas");       
         if (var!= null) {var =var.trim();} 
         String rus =dane.getString("Rusys");   
         System.out.println(kas+" "+rus);
    }

    void DbConnection() throws SQLException
    {
        String baza="jdbc:odbc:DatabaseDC"; 
        try
        {
            Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
        }catch(Exception e){System.out.println("Connection error");}
        connect=DriverManager.getConnection(baza);
    }

in DB type of field is TEXT, size 20, don't use any additional letter decoding or something like this.

it gives me " Imonė Imone " despite that in DB is written "Imonė" which equals rus.

Soluzione

As you're using the JDBC-ODBC bridge, you can specify a charset in the connection details.

Try this:

Properties prop = new java.util.Properties();
prop.put("charSet", "UTF-8");

String baza="jdbc:odbc:DatabaseDC"; 
connect=DriverManager.getConnection(baza, prop);

Altri suggerimenti

Now that the JDBC-ODBC Bridge has been removed from Java 8 this particular question will increasingly become just an item of historical interest, but for the record:

The JDBC-ODBC Bridge has never worked correctly with the Access ODBC Drivers ("Jet" and "ACE") for Unicode characters above code point U+00FF. That is because Access stores such characters as Unicode but it does not use UTF-8 encoding. Instead, it uses a "compressed" variation of UTF-16LE where characters with code points U+00FF and below are stored as a single byte, while characters above U+00FF are stored as a null byte followed by their UTF-16LE byte pair(s).

If the string 'Imonė' is stored within the Access database so that it appears properly in Access itself

then it is stored as

I  m  o  n  ė
-- -- -- -- --------
49 6D 6F 6E 00 17 01

('ė' is U+0117).

The JDBC-ODBC Bridge does not understand what it receives from the Access ODBC driver for that final character, so it just returns

Imon?

On the other hand, if we try to store the string in the Access database with UTF-8 encoding, as would happen if the JDBC-ODBC Bridge attempted to insert the string itself

Statement s = con.createStatement();
s.executeUpdate("UPDATE vocabulary SET word='Imonė' WHERE ID=5");

the string would be UTF-8 encoded as

I  m  o  n  ė
-- -- -- -- -----
49 6D 6F 6E C4 97

and then the Access ODBC Driver will store it in the database as

I  m  o  n  Ä  —
-- -- -- -- -- ---------
49 6D 6F 6E C4 00 14 20

C4 is 'Ä' in Windows-1252 which is U+00C4 so it is stored as just C4
97 is "em dash" in Windows-1252 which is U+2014 so it is stored as 00 14 20

Now the JDBC-ODBC Bridge can retrieve it okay (since the Access ODBC Driver "un-mangles" the character back to C4 97 on the way out), but if we open the database in Access we see

ImonÄ—

The JDBC-ODBC Bridge has never and will never be able to provide full native Unicode support for Access databases. Adding various properties to the JDBC connection will not solve the problem.

For full Unicode character support of Access databases without ODBC, consider using UCanAccess instead. (More details available in another question here.)

Try to use this "Windows-1257" instead of UTF-8, this is for Baltic region.

java.util.Properties prop = new java.util.Properties();
prop.put("charSet", "Windows-1257");

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow