Selenium (Python) - Get webdriver's page_source after page is fully loaded

https://stackoverflow.com/questions/23306246

09-07-2023
|

Question

I have to get data from a dynamic page (many of them in fact). I can access the page using Selenium in Python. However, the driver.page_source is incomplete. Even if I try driver.implicitly_wait(100) nothing changes.

I also tried:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

WebDriverWait(driver, 10).until(
            EC.presence_of_element_located((By.LINK_TEXT, "Load all")))

While I see some waiting / pausing, more than enough for the page to load, I see no effect on driver.page_source after the wait.

Is there a solution here?

Thanks.

Solution

The solution is to use something other for grabbing the page source if you really need it. Webdrivers' getPageSource will just return some state in some formatting of the last page the driver was on.

From the (java)docs, but most probably applies to other languages:

getPageSource

java.lang.String getPageSource()
Get the source of the last loaded page. If the page has been modified after loading
(for example, by Javascript) there is no guarantee that the returned text is that of the modified page. Please consult the documentation of the particular driver being used to determine whether the returned text reflects the current state of the page or the text last sent by the web server. The page source returned is a representation of the underlying DOM: do not expect it to be formatted or escaped in the same way as the response sent from the web server. Think of it as an artist's impression.
Returns:
    The source of the current page

http://selenium.googlecode.com/git/docs/api/java/org/openqa/selenium/WebDriver.html#getPageSource%28%29

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow