Python - downloading a CSV file from a web page with a link

Question 1

Here's my suggestion, for automatically applying the server cookies and basically mimicking standard client session behavior.

(Shamelessly inspired by @pope's answer 554580.)

import urllib2
import urllib
from lxml import etree

_TARGET_URL = 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/DataExports/ExportProductionData.aspx?PERIOD_ID=2013-0'
_AGREEMENT_URL = 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/Welcome/Agreement.aspx'
_CSV_OUTPUT = 'urllib2_ProdExport2013-0.csv'


class _MyHTTPRedirectHandler(urllib2.HTTPRedirectHandler):

    def http_error_302(self, req, fp, code, msg, headers):
        print 'Follow redirect...'  # Any cookie manipulation in-between redirects should be implemented here.
        return urllib2.HTTPRedirectHandler.http_error_302(self, req, fp, code, msg, headers)

    http_error_301 = http_error_303 = http_error_307 = http_error_302

cookie_processor = urllib2.HTTPCookieProcessor()

opener = urllib2.build_opener(_MyHTTPRedirectHandler, cookie_processor)
urllib2.install_opener(opener)

response_html = urllib2.urlopen(_TARGET_URL).read()

print 'Cookies collected:', cookie_processor.cookiejar

page_node, submit_form = etree.HTML(response_html), {}  # ElementTree node + dict for storing hidden input fields.
for input_name in ['ctl00$MainContent$AgreeButton', '__EVENTVALIDATION', '__VIEWSTATE']:  # Form `input` fields used on the ``Agreement.aspx`` page.
    submit_form[input_name] = page_node.xpath('//input[@name="%s"][1]' % input_name)[0].attrib['value']
    print 'Form input \'%s\' found (value: \'%s\')' % (input_name, submit_form[input_name])

# Submits the agreement form back to ``_AGREEMENT_URL``, which redirects to the CSV download at ``_TARGET_URL``.
csv_output = opener.open(_AGREEMENT_URL, data=urllib.urlencode(submit_form)).read()
print csv_output

with file(_CSV_OUTPUT, 'wb') as f:  # Dumps the CSV output to ``_CSV_OUTPUT``.
    f.write(csv_output)
    f.close()

Good luck!

[Edit]

On the why of things, I think @Steinar Lima is correct with respect to requiring a session cookie. Though unless you've already visited the Agreement.aspx page and submitted a response via the provider's website, the cookie you copy from the browser's web inspector will only result in another redirect to the Welcome to the PA DEP Oil & Gas Reporting Website welcome page. Which of course eliminates the whole point of having a Python script do the job for you.

Question 2

You need to set the ASP.NET_SessionId cookie. You can find this by using Chrome's Inspect element option in the context menu, or by using Firefox and the Firebug extension.

With Chrome:

Right-click on the webpage (after you've agreed to the terms) and select Inspect element
Click Resources -> Cookies
Select the only element in the list
Copy the Value of the ASP.NET_SessionId element

With Firebug:

Right-click on the webpage (after you've agreed to the terms), and click *Inspect Element with Firebug
Click Cookies
Copy the Value of the ASP.NET_SessionId element

In my case, I got ihbjzynwfcfvq4nzkncbviou - it might work for you, if not you need to perform the above procedure.

Add the cookie to your request, and download the file using the requests module (based on an answer by eladc):

import requests

cookies = {'ASP.NET_SessionId': 'ihbjzynwfcfvq4nzkncbviou'}
r = requests.get(
    url=('https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/'
         'DataExports/ExportProductionData.aspx?PERIOD_ID=2013-0'),
    cookies=cookies
)

with open('2013-0.csv', 'wb') as ofile:
    for chunk in r.iter_content(chunk_size=1024):
        ofile.write(chunk)
        ofile.flush()