سؤال

I'm working on a project for my food science class that requires me to do research but why do that when you can have something do it for you? Anyway I am using python 2.7 and BeautifulSoup with urllib2 and needed help figuring out how to only print the content between tags and not the tags themselves so that I can just copy and past what it grabs for me to a Google doc. This is my code I am using any help is much appreciated thank you!

import BeautifulSoup, urllib2, time
from BeautifulSoup import *

print("BELLY-FAT-CURE")
url = urllib2.urlopen("http://www.webmd.com/diet/belly-fat-diet")

content = url.read()

soup = BeautifulSoup(content)
headers = soup.findAll("h3")
texts = soup.findAll("p")

print(headers)
print(texts)
time.sleep(5)

print("CABBAGE SOUP DIET INFO")
url = urllib2.urlopen("http://www.webmd.com/diet/cabbage-soup-diet")
content1 = url.read()

soup1 = BeautifulSoup(content1)
headers1 = soup.findAll("h3")
texts1 = soup.findAll("p")
print(headers1)
print(texts1)
هل كانت مفيدة؟

المحلول

Get the values of a text attribute for each element:

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("http://www.webmd.com/diet/belly-fat-diet"))

print([header.text for header in soup.find_all("h3")])
print([p.text for p in soup.find_all("p")])

Prints:

[u'The Promise', u'Does It Work?', ... ]
[u'Common Conditions', u'Featured Topics', ... ]

Note that in the example I'm using BeautifulSoup4 which is the version you should use too - the third version is no longer developed and maintained.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top