Question

I'm playing with a function in Python 3 that queries small blocks of XML from the eBird API, parsing them with minidom. The function locates and compares dates from two requested blocks of XML, returning the most recent. The code below does its job, but I wanted to ask if there was a simpler way of doing this (the for loops seem unnecessary since each bit of XML will only ever have one date, and comparing pieces of the returned string bit by bit seems clunky). Is there a faster way to produce the same result?

from xml.dom import minidom
import requests

def report(owl):
    #GETS THE MOST RECENT OBSERVATION FROM BOTH USA AND CANADA
    usa_xml = requests.get('http://ebird.org/ws1.1/data/obs/region_spp/recent?rtype=country&r=US&sci=surnia%20ulula&back=30&maxResults=1&includeProvisional=true')
    canada_xml = requests.get('http://ebird.org/ws1.1/data/obs/region_spp/recent?rtype=country&r=CA&sci=surnia%20ulula&back=30&maxResults=1&includeProvisional=true')
    usa_parsed = minidom.parseString(usa_xml.text)
    canada_parsed = minidom.parseString(canada_xml.text)

    #COMPARES THE RESULTS AND RETURNS THE MOST RECENT
    usa_raw_date = usa_parsed.getElementsByTagName('obs-dt')
    canada_raw_date = canada_parsed.getElementsByTagName('obs-dt')
    for date in usa_raw_date:
        usa_date = str(date.childNodes[0].nodeValue)   
    for date in canada_raw_date:
        canada_date = str(date.childNodes[0].nodeValue)
    if int(usa_date[0:4]) > int(canada_date[0:4]):
        most_recent = usa_date
    elif int(usa_date[5:7]) > int(canada_date[5:7]):
        most_recent = usa_date
    elif int(usa_date[8:10]) > int(canada_date[8:10]):
        most_recent = usa_date
    elif int(usa_date[11:13]) > int(canada_date[11:13]):
        most_recent = usa_date
    elif int(usa_date[14:16]) > int(canada_date[14:16]):
        most_recent = usa_date
    else:
        most_recent = canada_date
    return most_recent
Was it helpful?

Solution

Use the datetime.datetime.strftime() to parse the dates into datetime.datetime() objects, then us max() to return the greater value (most recent):

usa_date = datetime.datetime.strptime(
    usa_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
canada_date = datetime.datetime.strptime(
    canada_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
return max(usa_date, canada_date)

Running this now against the URLs you provided, that results in:

>>> usa_date = datetime.datetime.strptime(
...     usa_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
>>> canada_date = datetime.datetime.strptime(
...     canada_raw_date[-1].childNodes[0].nodeValue, '%Y-%m-%d %H:%M')
>>> usa_date, canada_date
(datetime.datetime(2014, 5, 5, 11, 0), datetime.datetime(2014, 5, 11, 18, 0))
>>> max(usa_date, canada_date)
datetime.datetime(2014, 5, 11, 18, 0)

This returns a datetime.datetime() object; if returning a string is important to you, you can always still return:

return max(usa_date, canada_date).strftime('%Y-%m-%d %H:%M')

e.g. format the datetime object to a string again.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top