Erlang Emysql encoding difference between prepared and regular Query

https://stackoverflow.com//questions/9686979

13-12-2019
|

Question

I Have wrote a question which got a right answer here about emysql encoding. The answer pinpoint another question...

I'm trying to store iPhone emojis into a database...

When I do :

Query = io_lib:format("UPDATE Users SET c=\"~s\" WHERE id=~B", [C, Id]),
emysql:execute(mydb, Query).

Everything works fine...

But with:

emysql:prepare(update_c, <<"UPDATE Users SET c=? WHERE id=?">>),
emysql:execute(mydb, update_c, [C, Id]).

I'm retrieving Mojibake. EDITED TO USE THE CORRECT TERM

I'm connecting with :

 emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", latin1)

Unfortunately, I cannot use utf8 because of the previous software that used the database and stored emoji's that way, If I do use utf8, it will work with the new system, but not with rows inserted by the old one.

EDIT:

I really would really like to use prepared statement, that would prevent SQL injection effectively.

Solution

Edit: should be fixed in 253b7f94f9b04526e6868d7b693e6e9ee41de374. Thanks for feedback. https://github.com/Eonblast/Emysql/commit/253b7f94f9b04526e6868d7b693e6e9ee41de374

I believe it's an error in Emysql and I think I fixed it. Still working out the unit tests so it all makes sense. I'll let you know when it's posted to github.

I opened an issue for this: https://github.com/Eonblast/Emysql/issues/24

Essentially, you are tricking the driver and the database because you open the connection with latin-1 but the database is utf-8. Then you trip over the automatic conversion.

Still, I think you are right that the driver should respect that you set the connection to latin-1 and not do the magic of automatic conversion to utf-8. If you read issue #14 at Eonblast/Emysql at github you'll find I always suspected automatic conversion was a bad idea.

However, just from the fact that the unit tests for the conversions are now blowing up by the factor of four (and pose some rather uninteresting but mind boggling fringe issues I can't get my head around) I think tricking the database the way you do is likewise a bad idea. If you can, you should clean this up rather than rely on the mechanics in-between to hold. There are multiple levels in MySQL where conversions occur. As you know you can set the connection, the database, also the table to a character set. It's a great way to produce bugs. Can you describe why you could not? Because you have no control and must act blind to encoding? I'd like to know if there is a real case where you can't live without this hack.

Regardless, your complaint about the setting of the connection to latin-1 probably showed the way to eliminate all or most of the guessing in the character conversions in Emysql. That's very much appreciated and I hope I'll have a solution for you later today.

Henning

OTHER TIPS

Just convert you table to UTF-8:

ALTER TABLE Users CONVERT TO CHARACTER SET utf8;

Then you can use utf-8 with new data and the old will have been converted to UTF-8 aswell.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow