Question

The objective of my cron job is to save tweets with their timestamps into Google App Engine's datastore. I haven't been able to figure out how to save the data in timestamp form (it is currently saved as a string). Ideally I'd like to save this as a DateTimeProperty to have an easier time of sorting entries down the road. There are particular two problems that I'm struggling with:

  1. proper use of the time.mktime(), and
  2. putting the correct formatted value into GQL

The field is formated in the json like this:

s = "Wed, 20 Mar 2013 05:39:25 +0000"

I tried to use the datetime module to parse this string:

timestr = datetime.datetime.strptime(s, "%a, %b %Y %d %H:%M:%S +0000")
when = datetime.fromtimestamp(time.mktime(timestr))

To sum everything up, this is a snippet of my cron.py file:

result = simplejson.load(urllib.urlopen(twitterurl))
for item in result['results']:

g = ""
try:
    g = simplejson.dumps(item['geo']['coordinates'])
except:
    pass

timestr = datetime.datetime.strptime(str(item['created_at']), "%a, %b %Y %d %H:%M:%S +0000")
when = datetime.fromtimestamp(time.mktime(timestr))

tStore = TweetsFromJSON(user_id=str(item['from_user_id']),
                        user=item['from_user'], 
                        tweet=unicodedata.normalize('NFKD', item['text']).encode('ascii', 'ignore'),
                        timestamp=when,
                        iso=item['iso_language_code'], 
                        geo=g
                        )

The model for the datastore would be:

class TweetsFromJSON(db.Model):
    user = db.TextProperty()
    user_id = db.TextProperty()
    tweet = db.TextProperty()
    timestamp = db.DateTimeProperty()
    iso = db.StringProperty()
    geo = db.StringProperty()
Was it helpful?

Solution

You should use the following format to scan the time string with datetime.strptime:

"%a, %d %b %Y %H:%M:%S %z"

This works properly in Python 3:

Python 3.3.0 (default, Mar 22 2013, 20:14:41) 
[GCC 4.2.1 Compatible FreeBSD Clang 3.1 ((branches/release_31 156863))] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = 'Wed, 20 Mar 2013 05:39:25 +0000'
>>> datetime.strptime(s, "%a, %d %b %Y %H:%M:%S %z")
datetime.datetime(2013, 3, 20, 5, 39, 25, tzinfo=datetime.timezone.utc)

Notice that this returns a datetime object, so further manipulation is unnecessary.

Unfortunately this doesn't work in Python 2;

Python 2.7.3 (default, Jan 17 2013, 21:23:30) 
[GCC 4.2.1 Compatible FreeBSD Clang 3.0 (branches/release_30 142614)] on freebsd9
Type "help", "copyright", "credits" or "license" for more information.
>>> from datetime import datetime
>>> s = 'Wed, 20 Mar 2013 05:39:25 +0000'
>>> datetime.strptime(s, "%a, %d %b %Y %H:%M:%S %z")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/_strptime.py", line 317, in _strptime
    (bad_directive, format))
ValueError: 'z' is a bad directive in format '%a, %d %b %Y %H:%M:%S %z'

This seems to be a bug in Python 2.7. The documentation mentions %z, but the code in /usr/local/lib/python2.7/_strptime.py doesn't contain the proper regular expression to match it.

As a workaround on Python 2, you can try this:

>>> datetime.strptime(s[:-6], "%a, %d %b %Y %H:%M:%S")
datetime.datetime(2013, 3, 20, 5, 39, 25)

This just cuts off the last 6 characters. This will only work correctly if the timezone offset has a sign and four digits. Another alterantive would be to use split and join:

>>> datetime.strptime(' '.join(s.split()[:-1]), "%a, %d %b %Y %H:%M:%S")
datetime.datetime(2013, 3, 20, 5, 39, 25)

From what I understand you would have to scan the timezone info yourself, create a custom tzinfo subclass (use the FixedOffset class example in the linked docs) and use datetime.replace() to put that in the datetime object.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top