Question

I am trying to parse csv financial data from the web into a dict that I can navigate through by key.
I am failing using csv.DictReader.
I have:

import csv
import urllib2
req = urllib2.Request('http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:BRCM&region=usa&culture=en-US&cur=USD&reportType=is&period=12&dataType=A&order=desc&columnYear=5&rounding=3&view=raw&r=886300&denominatorView=raw&number=3')
response = urllib2.urlopen(req)
response.next() 
csvio = (csv.DictReader(response))
print csvio.fieldnames 
print len(csvio.fieldnames)

Edited to reflect changes from answer below.

This almost gets me there, but I need to strip the leading 'Fiscal year...share data' before feeding it to DictReader. How best to do this? I have looked at converting to string and stripping lead chars with str.lstrip() as the docs say here with no luck.

Was it helpful?

Solution

To use a DictReader you need to either specify the field names, or the field names need to be the first row of the csv data (ie. a header row).

In the csv file that your code retrieves, the field names are in the second row of data, not the first. What I've done is to throw out the first line of data before passing the csv file to the DictReader constructor.

In response to your updated question: Stripping the leading text from the header row probably isn't desirable as this is acting as the field name for the first column of data. Probably better to discard the first 2 rows of data and then supply the desired field names directly to the DictReader. I have updated the example below to reflect this.

import csv
import urllib2

req = urllib2.Request('http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:BRCM&region=usa&culture=en-US&cur=USD&reportType=is&period=12&dataType=A&order=desc&columnYear=5&rounding=3&view=raw&r=886300&denominatorView=raw&number=3')
response = urllib2.urlopen(req)

response.readline() # This reads (and discards) the first row of data which is not needed.
response.readline() # skip the 

myFieldnames = ["firstColName", "TTM", "2012", "2011", "2010", "2009", "2008"]
csvio = csv.DictReader(response, fieldnames=myFieldnames)

print csvio.fieldnames 
for row in csvio:
    print row
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top