Question

I've had a look at many tutorials regarding cookiejar, but my problem is that the webpage that i want to scape creates the cookie using javascript and I can't seem to retrieve the cookie. Does anybody have a solution to this problem?

Was it helpful?

Solution

If all pages have the same JavaScript then maybe you could parse the HTML to find that piece of code, and from that get the value the cookie would be set to?

That would make your scraping quite vulnerable to changes in the third party website, but that's most often the case while scraping. (Please bear in mind that the third-party website owner may not like that you're getting the content this way.)

OTHER TIPS

I responded to your other question as well: take a look at mechanize. It's probably the most fully featured scraping module I know: if the cookie is sent, then I'm sure you can get to it with this module.

Maybe you can execute the JavaScript code in a JavaScript engine with Python bindings (like python-spidermonkey or pyv8) and then retrieve the cookie. Or, as the javascript code is executed client side anyway, you may be able to convert the cookie-generating code to Python.

You could access the page using a real browser, via PAMIE, win32com or similar, then the JavaScript will be running in its native environment.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top