Pergunta

I am talking about pages like this one: http://en.wikipedia.org/wiki/Acetone I would like to get info from the chart that stores Density, Molar Mass, Boiling Point, etc. I need the program to store the info in separate strings. Yes not variables, so:

vapor_pressure = "24.46"

Not:

vapor_pressure = 24.46

This is because I need to to be typed in again somewhere else, but I've got that part down. Also, how do I get it to remove all characters from the string, except numbers and decimal points? Thats pretty much all I need.

Foi útil?

Solução 2

I solved this by getting the html of the entire page:

import bs4
import urllib2
soup = BeautifulSoup(urllib2.urlopen(http://en.wikipedia.org/wiki/Acetone).read())

Converted it to text:

page = soup.get_text()

Then when I printed page I found that the properties were separated by two line breaks:

list1 = page.split('\n\n')

To grab just the string with vapor pressure:

vaporpressure = [x for x in list1 if "Vapor pressure" in x]

Then when I print vaporpressure I got something like:

Vapor pressure
24.46–24.60 kPa (at 20 °C)

Thats what I did.

Outras dicas

You can use the MediaWiki API:

do shell script "curl -s 'http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&titles=Acetone'|sed -n 's/^| VaporPressure = \\([0-9.]*\\).*/\\1/p'"

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top