Quotes Messing Up Python Scraper

https://stackoverflow.com/questions/20821698

22-09-2022
|

Question

I am trying to scrape all the data within a div as follows. However, the quotes are throwing me off.

<div id="address">
    <div class="info">14955 Shady Grove Rd.</div> 
    <div class="info">Rockville, MD 20850</div> 
    <div class="info">Suite: 300</div> 
</div>

I am trying to start it with something along the lines of

addressStart = page.find("<div id="address">")

but the quotes within the div are messing me up. Does anybody know how I can fix this?

Solution

To answer your specific question, you need to escape the quotes, or use a different type of quote on the string itself:

addressStart = page.find("<div id=\"address\">")
# or
addressStart = page.find('<div id="address">')

But don't do that. If you are trying to "parse" HTML, let a third-party library do that. Try Beautiful Soup. You get a nice object back which you can use to traverse or search. You can grab attributes, values, etc... without having to worry about the complexities of parsing HTML or XML:

from bs4 import BeautifulSoup
soup = BeautifulSoup(page)
for address in soup.find_all('div',id='address'): # returns a list, use find if you just want the first
    for info in address.find_all('div',class_='info'): # for attribute class, use class_ instead since class is a reserved word
        print info.string

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow