How to use the selenium.PhantomJS() on webpage already downloaded by scrapy
Question
def parseList(self, response):
dr=webdriver.PhantomJS()
dr.get(response.url)
pageSource = dr.page_source
print dr.page_source
The webpage is already download by scrapy (Included in the response.body
), and dr.get(response.url)
will download again.
Is there any way to let selenium directly use response.body
?
La solution
What about saving the HTML file with content from the response.body
and than do something like
url = "file:///your/path/to/downloaded/file.html"
dr.get(url)
Autres conseils
From the Scrapy doc:
Regardless of the type of this argument, the final value stored will be a str (never unicode or None).
I'm assuming you're using Selenium in Python if you're using Scrapy. You can parse that response.body
string with lxml or another library. What exactly do you mean by "let selenium use response.body
"?
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow