Lynx - how to delay download process before dump website's content

Question

The problem here is that the webpage is being built by a javascript function. Such pages can be tricky to download with tools like lynx (or curl, which IMHO is better at the basic download problem). In order to download the contents you see on that page, you'd need to first load the javascript files needed by the page, and then execute the javascript "as though you were a browser". That javascript will proceed to request some data, which turns out to be XML, and then builds HTML from that data.

Note that the "website" doesn't render its data. Your browser renders the data. Or, to be more accurate, your browser is expected to render it but lynx won't because it doesn't do javascript.

So you have a couple of options. You could try to find a scriptable javascript-aware browser (iirc links does javascript, but I don't know offhand how to script it to do what you want.)

Or you can cheat. By using Chrom{e,ium}'s "developer" tools, you can see what URL is being requested by the javascript. It turns out, in this case, to be

http://build.chromium.org/cgi-bin/svn-log?url=http://src.chromium.org/svn//trunk/src&range=41818:40345

so you could get it with curl as follows

curl -G \
     -d url=http://src.chromium.org/svn//trunk/src \
     -d range=41818:40345 \
     http://build.chromium.org/cgi-bin/svn-log \
     > 41818-40345.xml

That XML data is in a pretty straightforward (i.e. apparently easy to reverse-engineer) format. And then you could use a simple scriptable xml tool like xmlstarlet (or any XSLT tool) to take the xml apart and reformat as you wish. With luck, you might even find some documentation (or a DTD) somewhere for the xml.

At least, that's how I would proceed.