A couple of things:
Given you're using selenium
, you don't need either mechanize
or urllib2
as selenium
is doing the actual page loading. As for the other imports (httplib
, logging
, os
and time
), they're either unused or redundant.
For my own convenience, I changed the code to use Firefox
; you can change it back to Chrome
(or other any browser).
In regards to the ActionChains
, you don't them here as you're only doing a single click (nothing to chain really).
Given the browser is receiving data (via AJAX) instead of loading a new page, we don't know when the new data has appeared; so we need to detect the change.
We know that 'clicking' the button loads more <li>
tags, so we can check if the number of <li>
tags has changed. That's what this line does:
WebDriverWait(selenium_browser, 10).until(lambda driver: len(driver.find_elements_by_xpath("//div[@id='headlines_transcripts']//li")) != old_count)
It will wait up to 10 seconds, periodically comparing the current number of <li>
tags from before and during the button click.
import selenium
from selenium import webdriver
from selenium.common.exceptions import StaleElementReferenceException
from selenium.common.exceptions import WebDriverException
from selenium.common.exceptions import TimeoutException as SeleniumTimeoutException
from selenium.webdriver.support.ui import WebDriverWait
url = "http://seekingalpha.com/symbol/IBM/transcripts"
selenium_browser = webdriver.Firefox()
selenium_browser.set_page_load_timeout(30)
selenium_browser.get(url)
selenium_browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
elem = selenium_browser.find_element_by_css_selector("div #transcripts_show_more div#more.older_archives")
old_count = len(selenium_browser.find_elements_by_xpath("//div[@id='headlines_transcripts']//li"))
elem.click()
try:
WebDriverWait(selenium_browser, 10).until(lambda driver: len(driver.find_elements_by_xpath("//div[@id='headlines_transcripts']//li")) != old_count)
except StaleElementReferenceException:
pass
except SeleniumTimeoutException:
pass
print(selenium_browser.page_source.encode("ascii", "ignore"))
I'm on python2.7; if you're on python3.X, you probably won't need .encode("ascii", "ignore")
.