Question

def parseList(self, response):
    dr=webdriver.PhantomJS()   
    dr.get(response.url)
    pageSource = dr.page_source
    print dr.page_source

The webpage is already download by scrapy (Included in the response.body), and dr.get(response.url) will download again.

Is there any way to let selenium directly use response.body?

Était-ce utile?

La solution

What about saving the HTML file with content from the response.body and than do something like

url = "file:///your/path/to/downloaded/file.html"
dr.get(url)

Autres conseils

From the Scrapy doc:

Regardless of the type of this argument, the final value stored will be a str (never unicode or None).

I'm assuming you're using Selenium in Python if you're using Scrapy. You can parse that response.body string with lxml or another library. What exactly do you mean by "let selenium use response.body"?

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top