Question

I'm trying to populate Facebook posts data from a particular page (here bestbuy), extracted via Graph API (https://github.com/pythonforfacebook/facebook-sdk), to mysql tables. I'm extracting posts as well as comments to posts. Here I'm talking about comments, same issue is applicable to posts. Character set is set as utf-8 for the db schema. Now when I insert the comment content (comment_message) in the database I do comment_message.encode('utf-8') in Python script before inserting. But it doesn't work properly and a lot of characters are replaced with some other characters. So for the comment in the following post - https://www.facebook.com/12699262021/posts/10152351243512022

results in the following after comment_message.encode('utf-8') -

Hola Ñon-

Muchas gracias por tu pregunta. En caso de que no hayas tenido el momento, te re comiendo visitar nuestra página online http://BestBuy.com.

Aquí encontraras los precios sin impuestos. Lo impuestos varían dependiendo la cuidad y la tienda en donde finalices la compra.

Ten en cuenta que todos los productos que compres con Best Buy están destinados al uso de los Estados Unidos, cada producto tiene una garantía de fabricante e n forma gratuita. Para saber más detalles de la garantía del fabricante, te ac onsejamos que te comuniques con Nikon.

Hasta mi mejor conocimiento, todas nuestras tiendas localizadas en Nueva York es tarán abiertas el 18 de abril.

Atentamente, Karina

You can see a lots of characters are messed up. Below is the table schema in which I'm inserting using pymysql -

CREATE TABLE `xxxxxxxxxxxxxx` (
  `comment_id` varchar(100) NOT NULL,
  `post_id` varchar(100) DEFAULT '-',
  `from_name` varchar(100) DEFAULT '-',
  `from_category` varchar(50) DEFAULT '-',
  `from_id` varchar(50) DEFAULT '-',
  `message` varchar(10000) DEFAULT '-',
  `created_time` varchar(45) DEFAULT '-',
  `likes` int(10) unsigned DEFAULT '0',
  `page` varchar(50) DEFAULT '-',
  `type` varchar(100) DEFAULT '-',
  `inserted_time` varchar(60) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

If I try to insert the content directly without any encoding, I get -

    sql = sql.encode(self.encoding)
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 148-149:
 ordinal not in range(256)
Was it helpful?

Solution

I found the issue here. I need to do following two things to get rid of it -

First, setting default character set to Unicode in Python script -

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Second, while connecting to db, set the parameters use_unicode and charset -

conn = pymysql.connect(host='xx', user='xx', passwd='xx', db='xx', use_unicode=True, charset='utf8')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top