Question

I have a list of dictionaries data_dump which contains dictionaries like:

d = {"ids": s_id, "subject": subject}

I'm following the tutorial trying to do a bulk insert:

connection = Connection(host,port)
db = connection['clusters']
posts = db.posts
posts.insert(data_dump)

Which fails with the following error:

 File "/usr/local/lib/python2.7/dist-packages/pymongo/collection.py", line 312, in insert
continue_on_error, self.__uuid_subtype), safe)
bson.errors.InvalidStringData: strings in documents must be valid UTF-8

Please advise. Thanks

Was it helpful?

Solution

Solved: Well.. forced the encoding by 1) Stripping the string of symbols etc and then 2) converting ascii to utf-8 by raw.decode('ascii') and then decoded_string.encode('utf8') Thanks guys.. :)

OTHER TIPS

I couldn't afford to lose the non utf-8 characters. So I chose to convert the string to Binary, instead.

As per your example,

>>> print subject
u'Math'
>>> d = {"ids": s_id, "subject": bson.Binary(str(subject))} # convert subject from unicode to Binary

You can't run full-text searches, which is the latest feature in Mongo, but it works well for everything else.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top