Frage

I am trying to collate reviews of restaurants. Urllib2 works fine for the initial page of reviews, but there is then a link to load the next increment of comments which is a javascript link. An example page is here, and the code for the link "Next 25" is:

<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$RestRatings$Next','')" class="red" id="ctl00_ContentPlaceHolder1_RestRatings_Next">NEXT 25&gt;&gt; </a>

I have looked at all the previous answers (e.g.), and I have to say I'm none the wiser. Looking at the console in Firebug doesn't offer up a handy link. Could you suggest the best (easiest) way to achieve this?

Edit: With thanks to Seleniumnewbie this code will print out all the comments from the reviews.:

from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import re

driver = webdriver.Firefox()

def getURLinfo(url):

    driver.get(url)
    html = driver.page_source
    next25 = "ctl00_ContentPlaceHolder1_RestRatings_Next"
    soup = BeautifulSoup(html)

    while soup.find(id=re.compile(next25)):            
        driver.find_element_by_id(next25).click()
        html = html + driver.page_source
        soup = BeautifulSoup(driver.page_source)

    soup = BeautifulSoup(html)
    comment = soup.findAll(id=re.compile("divComment"))

    for entry in comment:
        print entry.div.contents #for comments

    driver.close()
War es hilfreich?

Lösung 2

Find the element by id="ctl00_ContentPlaceHolder1_RestRatings_Next" and then click it.

Andere Tipps

When a user clicks that link, the function __doPostBack is being called in javascript on the client. The link to the other question you provided assumes this function makes an AJAX call and then places the result in the same page.

However, the review pages you have linked to doesn't do that. It does make an AJAX call, but then it reloads the same page. I couldn't get to trap what the AJAX call is because it reloads immediately, but since the page is just reloading with the new comments I'm pretty sure that it is telling the server to move you to the next page.

So, in order to get your next page of comments you will have to call the same url that the __doPostBack function is calling and then reload the page you are on. To find this url, I would de-obfuscate their javascript and find the function being called. I believe the actual URL that will be called will depend on the parameter to that function so you want to make sure to replicate what it does.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top