質問

I'm trying to store text (with emoji) From an iPhone Client App on a MySQL database with Erlang. (Into a varchar column)

I used to do it with a socket connection server done with C++ and mysqlpp, it was working great. (It is the exact same database, So I can assume that the issue is not coming from the database)

However, I decided to pass everything on Erlang for scalability reasons, and since, I am unable to store and retrieve correctly emojis.

I'm using emysql to communicate with my database.

When I'm storing, I'm sending this list to the database :

[240,159,152,130]

When I'm retrieving, here what I get :

<<195,176,194,159,194,152,194,130>>

There is some similarities obviously, we can see 159, 152 and 130 on both lines, but no 240. I do not know where 195, 176 and 194 come from.

I though about changing the emysql encoding when creating the connection pool.

 emysql:add_pool(my_db, 3, "login", "password", "db.mydomain.com", 3306, "MyTable", utf8)

But I can seems to find the proper atom for utf32 encoding. (The interesting thing is that I have not set any encoding on C++ and mysqlpp, it worked out of the box).

I have made some test...

storing from C++, retrieving from C++ (Works fine)
storing from Erlang, retrieving from Erlang (Does not work)
storing from Erlang, retrieving from C++ (Does not work)
storing from C++, retrieving from Erlang (Does not work)

One more information, I'm using prepared statement on Erlang, while I'm not on C++

Any help would be appreciated.

AS requested, here the query for storing data :

UPDATE Table SET c=? WHERE id=?

Quite simple really...

役に立ちましたか?

解決

It is all about utf-8 encoding. In Erlang a list of characters, in your case [240,159,152,130], aren't normally encoded but are the unicode code points. When you retrieved the data you got a binary containing with utf-8 encoding bytes of your characters. Exactly where this encoding occurred I don't know. From the erlang shell:

10> Bin = <<195,176,194,159,194,152,194,130>>.   
<<195,176,194,159,194,152,194,130>>
11> <<M/utf8,N/utf8,O/utf8,P/utf8,R/binary>> = Bin.
<<195,176,194,159,194,152,194,130>>
12> [M,N,O,P].
[240,159,152,130]

Handling unicode in erlang is pretty simple, characters in lists are usually the unicode code points and are very rarely encoded, while storing them in binaries means you have to encode them in some way, as binaries are just arrays of bytes. The default encoding is utf-8. In the module unicode there are functions for converting between unicode lists and binaries.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top