How to read programmatically the DOM generated by Javascript?
-
14-04-2021 - |
Frage
I can inspect any Javascript-generated DOM by using Firebug or another debugger. Firebug also allows me to interactively copy the generated innerHTML of any element onto the Clipboard such that I can save it to the disk.
Is there a system/tool that allows to perform these interactive tasks programmatically? Such a tool/plugin should be able to read the Javascript-generated DOM and save it to the disk programmatically.
Lösung
I don't know of any existing tool that would allow you to do this, so you probably need to write your own script to solve this task.
You can certainly use a library like Selenium to achieve this. Using it, you can even choose which browser you want to use to render the website.
If you are running on Linux, I can also recommend my own project webkit-scraping for this (this recommendation is a bit biased, of course ;). It uses an in-memory Webkit instance to render the page and execute the Javascript in it. After compiling the server with cd webkit-server && qmake && make
, you can do something like this in Python:
import os, sys
sys.path.insert(0, '/path/to/webkit-scraping/lib')
import webkit_scraping
URL = 'http://example.org'
OUTFILE = '/tmp/example.html'
if __name__ == '__main__':
# set up a web scraping session
driver = webkit_scraping.webkit_server.Driver()
sess = webkit_scraping.scraping.Session(driver = driver)
sess.visit(URL)
with open(OUTFILE, 'wb') as f:
f.write(sess.body())