String not valid UTF-8 (BSON::InvalidStringEncoding) when saving a UTF8 compatible string to MongoDB through Mongoid ORM

StackOverflow https://stackoverflow.com/questions/5408701

Question

I am importing data from a MySQL table into MongoDB using Mongoid for my ORM. I am getting an error when trying to save an email address as a string. The error is:

/Library/Ruby/Gems/1.8/gems/bson-1.2.4/lib/../lib/bson/bson_c.rb:24:in `serialize': String not valid UTF-8 (BSON::InvalidStringEncoding)
    from /Library/Ruby/Gems/1.8/gems/bson-1.2.4/lib/../lib/bson/bson_c.rb:24:in `serialize'

From my GUI - this is a screenshot of the table info. You can see it's encoded in UTF8.

table info

Also from my GUI - this is a screen shot of the field in my MySQL table that I am importing

what the data looks like in mysql GUI

This is what happens when I grab the data from MySQL CLI.

what the data looks like in mysql CLI

And finally, when I inspect the data in my ruby object, I get something that looks like this: inspected ruby object

I'm a bit confused here because regardless my table is in UTF-8 and that funky is apparently valid UTF-8 character as a double byte. Anyone know why I'm getting this error?

Was it helpful?

Solution

Try using this helper:

http://snippets.dzone.com/posts/show/4527

It puts a method utf8? on the String. So you can grab the String from mysql and see if it is utf8:

my_string.utf8?

If is not, then you can try change the encoding of your String using other methods like:

my_string.asciify_utf8
my_string.latin1_to_utf8
my_string.cp1252_to_utf8
my_string.utf16le_to_utf8

Maybe this String is saved on mysql in one of these encodings.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top