Prerequisite: read RegEx match open tags except XHTML self-contained tags famous topic.
Since it is an html page you are parsing, I'd use a specific tool made for this - an HTML parser
.
For example, BeautifulSoup
:
import urllib2
from bs4 import BeautifulSoup
word = 'Xylophone'
page = urllib2.urlopen('http://www.mso.anu.edu.au/~ralph/OPTED/v003/wb1913_'+word[0].lower()+'.html')
soup = BeautifulSoup(page)
print soup.find('b', text=word).parent.text
prints:
Xylophone (n.) An instrument common among the Russians, Poles, and Tartars, consisting of a series of strips of wood or glass graduated in length to the musical scale, resting on belts of straw, and struck with two small hammers. Called in Germany strohfiedel, or straw fiddle.