Python CGI script returning inconsistent results

https://stackoverflow.com/questions/16555785

29-05-2022
|

Question

So I have a Python CGI script running on my apache server. Basically, from a webpage, the user enters a word into a form, and that word is passed to the script. The word is then used to query the Twitter Search API and return all the tweets for that word. So the issue is, I'm running this query in a loop so I get three pages of results returned (approximately 300 tweets). But what I call the script (which prints out all the tweets into an HTML page), the page will sometimes display 5 tweets, sometimes 18, completley random numbers. Is this a timeout issue, or am I missing some basic in my code? Python CGI script posted below, thanks in advance.

#!/usr/bin/python

# Import modules for CGI handling 
import cgi, cgitb 
import urllib
import json

# Create instance of FieldStorage 
form = cgi.FieldStorage() 

# Get data from fields
topic = form.getvalue('topic')


results=[]


for x in range(1,3):
    response = urllib.urlopen("http://search.twitter.com/search.json?q="+topic+"&rpp=100&include_entities=true&result_type=mixed&lang=en&page="+str(x))
    pyresponse= json.load(response)
    results= results + pyresponse["results"]



print "Content-type:text/html\r\n\r\n"
print "<!DOCTYPE html>"
print "<html>"
print "<html lang=\"en\">"
print "<head>"
print "<meta charset=\"utf-8\" />"
print "<meta name=\"description\" content=\"\"/>"
print "<meta name=\"viewport\" content=\"width=device-width, initial-scale=1.0\"/>"
print "<title>Data analysis for %s </title>" %(topic)
print "</head>"
print "<body>"
print "<label>"
for i in range(len(results)):
    print str(i)+": "+results[i]["text"]+ "<br></br>"
print "</label>"
print "</body>"
print "</html>"

Solution

First of all I would point out that range(1,3) will not get you three pages like you are expecting.

However, running your Python code in an interpreter encountered an exception at this point:

>>> for i in range(len(results)):
...   print str(i) + ": "+ results[x]["text"]

<a few results print successfully>

UnicodeEncodeError: 'latin-1' codec can't encode character u'\U0001f611' 
in position 121: ordinal not in range(256)

Modifying the encoding then would print them all:

>>> for i in range(len(results)):
...   print str(i) + ": "+ results[i]["text"].encode('utf-8')
<success>

OTHER TIPS

Ok, got it. It was actually a really stupid fix. Basically, since the Python is parsing the JSON it needs to encode all of the text into UTF-8 format so it can display correctly.

print str(i)+": "+results[i]["text"].encode('utf-8')+ "<br></br>"

Nothing to do with the script or server itself.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow