Encoding UTF-8 when writing to CSV

Question 1

From Docs: https://docs.python.org/2/howto/unicode.html

a = "string"

encodedstring  = a.encode('utf-8')

If that does not work:

Python DictWriter writing UTF-8 encoded CSV files

Question 2

I have had the same problem. I have a large amount of data from twitter firehose so every possible complication case (and has arisen)!

I've solved it as follows using try / except:

if the dict value is a string: if isinstance(value,basestring) I try to encode it straight away. If not a string, I make it a string and then encode it.

If this fails, it's because some joker is tweeting odd symbols to mess up my script. If that is the case, firstly I decode then re-encode value.decode('utf-8').encode('utf-8') for strings and decode, make into a string and re-encode for non-strings value.decode('utf-8').encode('utf-8')

Have a go with this:

import csv

def export_to_csv(list_of_tweet_dicts,export_name="flat_twitter_output.csv"):

    utf8_flat_tweets=[]
    keys = []

    for tweet in list_of_tweet_dicts:
        tmp_tweet = tweet
        for key,value in tweet.iteritems():
            if key not in keys: keys.append(key)

            # convert fields to utf-8 if text
            try:
                if isinstance(value,basestring): 
                    tmp_tweet[key] = value.encode('utf-8')
                else:
                    tmp_tweet[key] = str(value).encode('utf-8')
            except:
                if isinstance(value,basestring):
                    tmp_tweet[key] = value.decode('utf-8').encode('utf-8')
                else:
                    tmp_tweet[key] = str(value.decode('utf-8')).encode('utf-8')

        utf8_flat_tweets.append(tmp_tweet)
        del tmp_tweet

    list_of_tweet_dicts = utf8_flat_tweets
    del utf8_flat_tweets

    with open(export_name, 'w') as f:
        dict_writer = csv.DictWriter(f, fieldnames=keys,quoting=csv.QUOTE_ALL)
        dict_writer.writeheader()
        dict_writer.writerows(list_of_tweet_dicts)

    print "exported tweets to '"+export_name+"'"

    return list_of_tweet_dicts

hope that helps you.