Question

I can inspect any Javascript-generated DOM by using Firebug or another debugger. Firebug also allows me to interactively copy the generated innerHTML of any element onto the Clipboard such that I can save it to the disk.

Is there a system/tool that allows to perform these interactive tasks programmatically? Such a tool/plugin should be able to read the Javascript-generated DOM and save it to the disk programmatically.

Was it helpful?

Solution

I don't know of any existing tool that would allow you to do this, so you probably need to write your own script to solve this task.

You can certainly use a library like Selenium to achieve this. Using it, you can even choose which browser you want to use to render the website.

If you are running on Linux, I can also recommend my own project webkit-scraping for this (this recommendation is a bit biased, of course ;). It uses an in-memory Webkit instance to render the page and execute the Javascript in it. After compiling the server with cd webkit-server && qmake && make, you can do something like this in Python:

import os, sys

sys.path.insert(0, '/path/to/webkit-scraping/lib')
import webkit_scraping

URL = 'http://example.org'
OUTFILE = '/tmp/example.html'

if __name__ == '__main__':
  # set up a web scraping session
  driver = webkit_scraping.webkit_server.Driver()
  sess = webkit_scraping.scraping.Session(driver = driver)
  sess.visit(URL)

  with open(OUTFILE, 'wb') as f:
    f.write(sess.body())
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top