TwythonStreamer accents encoding? - unable to decode response, not valid JSON, with code 200

https://stackoverflow.com/questions/19247907

30-06-2022
|

Question

I started playing recently with Twython and the twitter API. The auth was a bit cumbersome to deal with but it's now working perfectly with a little Bottle webserver included in my script.

I'm trying to do something very simple: track a hashtag with the streaming filter API. It seemed to work well at first but now I can see many errors in my log:

Error!
200 Unable to decode response, not valid JSON

It happens only on part of the tweets. I thought it could be linked to coordinates, but that's not it. I just tested and it seems to be caused by accents (éèêàâ...) encoding issues.

How can I fix this?

My streamer code is very basic:

class QMLStreamer(TwythonStreamer):
    def on_success(self, data):
        if 'text' in data:
            if 'coordinates' in data and data['coordinates'] and 'coordinates' in data['coordinates']:
                tweetlog("[%s](%s) - %s" % (data['created_at'], data['coordinates']['coordinates'], data['text'].encode('utf-8')))
            else:
                tweetlog("[%s] - %s" % (data['created_at'], data['text'].encode('utf-8')))

    def on_error(self, status_code, data):
        print colored('Error !', 'red', attrs=['bold'])
        print status_code, data

Solution

The error happens in your code. You shouldn't be using .encode() here.

It is a bit counter-intuitive, but on_error() will be called if on_success() raised an exception, which is probably what happens here (a UnicodeDecodeError). That's why you're seeing an error code 200 ("HTTP Ok").

Twython is returning data as unicode objects, so you can just do:

print(u"[%s](%s) - %s" % (data['created_at'], data['coordinates']['coordinates'], data['text']))

You should probably add your own try...except block in on_success() for further debugging.

Also, I'm not sure what your tweetlog() function does, but be aware if you are on Windows that print() might have issues writing some codepoints as it will try to convert to the terminal's codepage.

OTHER TIPS

No a perfect answer but you can try printing a normalized version of the text using unicodedata:

import unicodedata

...

tweetlog("[%s](%s) - %s" % (data['created_at'], data['coordinates']['coordinates'], unicodedata.normalize('NFD',data['text']).encode('ascii', 'ignore')))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow