Domanda

I would like your advice/help on this:

Create a python script that:

Add the data collected from the website into a new line of the CSV file.

Rules:

  • The script must be running on your computer automatically for 5 days.

Do you have some advice? :(

I appreciate your help on this.

I tried this :

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

setURL = 'http://www.wunderground.com/global/stations/54511.html?MR=1'
request = urllib2.Request(setURL)
response = opener.open(request)

url = "http://www.wunderground.com/global/stations/54511.html?MR=1"
request = urllib2.Request(url)
page = opener.open(request)

WeatherData = page.read()                            
print WeatherData

So it print all the data but I want to print only the :

Datetime (timestamp of data captured) - Current Condition - Temperature

  • like I said need advice.like :

    • what should I use to complete this task.

    • How can set the data collect for (days), I don't know..

I don't need the full answer and copy paste, I'm not fool...

I want to UNDERSTAND.

È stato utile?

Soluzione

Weather Underground have posted a Python sample to access their API. I think you'd be best off using their code. If you play around with the parsed_json variable in the example, you should be able to get what you want.

As for running the program after fixed intervals, check out this thread.

Altri suggerimenti

This kind of task is called Screen scraping. The code I show below just adds a few string-manipulation routines for very basic cleanup, but you can do a much better job with a tool made for screen-scraping, like Beautiful Soup.

import urllib2
import cookielib

cookieJar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookieJar))

url = "http://www.wunderground.com/global/stations/54511.html?MR=1"
request = urllib2.Request(url)
page = opener.open(request)

# This is one big string
rawdata = page.read()

# This breaks it up into lines
lines_of_data = rawdata.split('\n')

# This is one line in the raw data that looks interesing.  I'm going to
# filter the raw data based on the "og:title" text.
# 
#'<meta name="og:title" content="Beijing, Beijing | 31&deg; | Clear" />

# The "if line.find(" bit is the filter. 
special_lines = [line for line in lines_of_data if line.find('og:title')>-1]
print special_lines

# Now we clean up - this is very crude, but you can improve it with
# exactly what you want to do.
info = special_lines[0].replace('"','').split('content=')[1]
sections = info.split('|')
print sections

Output:

['\t\t<meta name="og:title" content="Beijing, Beijing | 32&deg; | Clear" />']
['Beijing, Beijing ', ' 32&deg; ', ' Clear />']

Edit: By all means, if the particular website offers web services like JSON as in the answer by Xaranke, use that! Not all websites do though, so Beautiful Soup can still be very useful.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top