Well, it really depends on what you mean by "parse" but here is a full working example on how to extract all links from the main section with BeautfulSoup:
from bs4 import BeautifulSoup
import urllib.request
def main():
url = 'http://yugioh.wikia.com/wiki/Card_Tips%3aBlue-Eyes_White_Dragon'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page.read())
content = soup.find('div',id='mw-content-text')
links = content.findAll('a')
for link in links:
print(link.get_text())
if __name__ == "__main__":
main()
This code should be self explanatory, but just in case:
- First we open the page with
urllib.reauest.urlopen
and pass its contents to BS - Then we extract the main content div by its id. (The id
mw-content-text
can be found in the page's source) - We proceed with extracting all the links inside the main content
- In a
for
loop we print all the links.
Additional methods, you might need for parsing the links:
link.get('href')
extracts the destination urllink.get('title')
extracts the alternative title of the link
And since you asked for resources: http://www.crummy.com/software/BeautifulSoup/bs4/doc/ is the first place you should start.