Question

I am tring to store non english(like: Bengali,Hindi) data in a MongoDB field.

This is my approach:-

import pymongo
from pymongo import MongoClient
client = MongoClient()
db = client.testdb

db['testing'].save({'data':'শুভ নববর্ষ'})

I got an Exception. Exception Value: Non-ASCII character '\xe0' in file /test/views.py on line 5, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (views.py, line 5)

After that I have tried like this:-

from bson import BSON
bson_string = BSON.encode({'data':'শুভ নববর্ষ'})
db['testing'].save({'data':'শুভ নববর্ষ'})

This time also I got the same error.

Edit:- basically i am not able to print 'শুভ নববর্ষ' in IDLE

>>>print 'শুভ নববর্ষ'
Unsupported characters in input

1ST EDIT :-

I have added # -*- coding: utf-8 -*- in my views.py then able to store the data somehow. But this object structure is not same with the normal data structure in mongodb. Like:-

> db['testing'].find()
{ষ" } : ObjectId("52d65a50012bad0b23c13a65"), "data" : "শà§à¦­ নববরà


I have added another record.
>db['testing'].save({'data':'kousik chowdhury'})

Now the collection is looking funny.
> db['testing'].find()                                                           ষ" }
{ "_id" : ObjectId("52d65e6a012bad0a39a2685b"), "data" : "kousik chowdhury" }¦°à§

> db['testing'].find().length()
2

Data Retrive :-

** I am using PuTTY as a editor. 

>>> a = db['testing'].find()[0]
>>> a
{u'_id': ObjectId('52d65a50012bad0b23c13a65'), u'data': u'\u09b6\u09c1\u09ad\u09a8\u09ac\u09ac\u09b0\u09cd\u09b7'}
>>> mydata = a['data']
>>>mydata
u'\u09b6\u09c1\u09ad \u09a8\u09ac\u09ac\u09b0\u09cd\u09b7'
>>>mydata.encode('utf-8')
'\xe0\xa6\xb6\xe0\xa7\x81\xe0\xa6\xad \xe0\xa6\xa8\xe0\xa6\xac\xe0\xa6\xac\xe0\xa6\xb0\xe0\xa7\x8d\xe0\xa6\xb7'

Is there any standard process so that I can store it in mongodb in proper format and get the data back ?

Was it helpful?

Solution

Do you have line:

# -*- coding: <encoding name> -*-

on the beginning of your file? For example:

# -*- coding: utf-8 -*-

PART 2:

  • saving data use unicode prefix (u'')

  • assuming you wanted to do a['data'].encode('utf-8') it works correctly - just

    print a['data'].encode('utf-8')

HINT: There is never a good reason to override basic type with some value... (I mean str='')

OTHER TIPS

This works for me in iTerm on Mac:

# -*- coding: utf-8 -*-
from pymongo import MongoClient

db = MongoClient().test
db.test_collection.drop()
db.test_collection.save({'data': 'শুভ নববর্ষ'})
document = db.test_collection.find_one()
print document['data']

The printed output matches the input: শুভ নববর্ষ.

MongoDB itself expects all text to be encoded as UTF-8, so it supports all unicode text. The trouble you're having is finding a way to print the output when you retrieve it, in IDLE or anywhere else. Try running your script in the Windows command prompt and see if the output renders correctly there.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top