Domanda

Some web sites seem to rely solely on javascript to generate their web pages. As a user, we don't even get to see the ultimate "real" HTML output. For example, if you open a fedex tracking page < https://www.fedex.com/fedextrack/?tracknumbers=YOUR_TRACKING_NUMBER >, and view its source page (< view-source:https://www.fedex.com/fedextrack/?tracknumbers=YOUR_TRACKING_NUMBER > in chrome), you can only see some javascript code.

Question: how can we analyze such web pages? For example, how can we develop programs to re-construct and understand automatically the output HTML?

È stato utile?

Soluzione

You can reconstruct the DOM using a headless web browser. Here's an example: Phantom JS.

Alternatively, you could use Selenium to script an actual web browser.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top