Question

I am using the code at the far bottom to get weblink, and the Masjid name. however I would like to also get denomination and street address. please help I am stuck.

Currently I am getting the following

Weblink:

<div class="subtitleLink"><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah">

and Masjid name

<b>Masjid Al-Hijrah</b>

But would like to get the below;

Denomination

<b>Denomination:</b> Sunni (Traditional)

and street address

<br>45 Station Street (Sydney)&nbsp;&nbsp;

The below code scrapes the following

<td width=25><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah"><img src='http://www.halalfire.com/images/en/photo_small.jpg' alt='Masjid Al-Hijrah' title='Masjid Al-Hijrah' border=0 width=48 height=36></a></a></td><td width=10><img src="http://www.salatomatic.com/images/spacer.gif" width=10 border=0></td><td nowrap><div class="subtitleLink"><a href="http://www.salatomatic.com/d/Tempe+5313+Masjid-Al-Hijrah"><b>Masjid Al-Hijrah</b></a>&nbsp;&nbsp; </div><div class="tinyLink"><b>Denomination:</b> Sunni (Traditional)<br>45 Station Street (Sydney)&nbsp;&nbsp;</div></td><td align=right valign=center><div class="tinyLink"></div></td>

CODE:

from bs4 import BeautifulSoup
import urllib2

url1 = "http://www.salatomatic.com/c/Sydney+168"
content1 = urllib2.urlopen(url1).read()
soup = BeautifulSoup(content1) 

results = soup.findAll("div", {"class" : "subtitleLink"})
for result in results :
    br = result.find('b')
    a = result.find('a')
    currenturl =  a.get('href')
    if not currenturl.startswith("http"):
        currenturl = "http://www.salatomatic.com" + currenturl
        print currenturl
    elif currenturl.startswith("http"):
        print a.get('href')
    pos = br.get_text()
    print pos
Was it helpful?

Solution

You can check next <div> element with a class attribute with value tinyLink and that contains either a <b> and a <br> tags and extract their strings:

...
print pos 
div = result.find_next_sibling('div', attrs={"class": "tinyLink"})
if div and div.b and div.br:
    print(div.b.next_sibling.string)
    print(div.br.next_sibling.string)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top