Find the element by id="ctl00_ContentPlaceHolder1_RestRatings_Next"
and then click it.
Simulating clicking on a javascript link in python
-
30-11-2021 - |
Frage
I am trying to collate reviews of restaurants. Urllib2 works fine for the initial page of reviews, but there is then a link to load the next increment of comments which is a javascript link. An example page is here, and the code for the link "Next 25" is:
<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$RestRatings$Next','')" class="red" id="ctl00_ContentPlaceHolder1_RestRatings_Next">NEXT 25>> </a>
I have looked at all the previous answers (e.g.), and I have to say I'm none the wiser. Looking at the console in Firebug doesn't offer up a handy link. Could you suggest the best (easiest) way to achieve this?
Edit: With thanks to Seleniumnewbie this code will print out all the comments from the reviews.:
from selenium import webdriver
from BeautifulSoup import BeautifulSoup
import re
driver = webdriver.Firefox()
def getURLinfo(url):
driver.get(url)
html = driver.page_source
next25 = "ctl00_ContentPlaceHolder1_RestRatings_Next"
soup = BeautifulSoup(html)
while soup.find(id=re.compile(next25)):
driver.find_element_by_id(next25).click()
html = html + driver.page_source
soup = BeautifulSoup(driver.page_source)
soup = BeautifulSoup(html)
comment = soup.findAll(id=re.compile("divComment"))
for entry in comment:
print entry.div.contents #for comments
driver.close()
Lösung 2
Andere Tipps
When a user clicks that link, the function __doPostBack is being called in javascript on the client. The link to the other question you provided assumes this function makes an AJAX call and then places the result in the same page.
However, the review pages you have linked to doesn't do that. It does make an AJAX call, but then it reloads the same page. I couldn't get to trap what the AJAX call is because it reloads immediately, but since the page is just reloading with the new comments I'm pretty sure that it is telling the server to move you to the next page.
So, in order to get your next page of comments you will have to call the same url that the __doPostBack function is calling and then reload the page you are on. To find this url, I would de-obfuscate their javascript and find the function being called. I believe the actual URL that will be called will depend on the parameter to that function so you want to make sure to replicate what it does.