문제

Some web sites seem to rely solely on javascript to generate their web pages. As a user, we don't even get to see the ultimate "real" HTML output. For example, if you open a fedex tracking page < https://www.fedex.com/fedextrack/?tracknumbers=YOUR_TRACKING_NUMBER >, and view its source page (< view-source:https://www.fedex.com/fedextrack/?tracknumbers=YOUR_TRACKING_NUMBER > in chrome), you can only see some javascript code.

Question: how can we analyze such web pages? For example, how can we develop programs to re-construct and understand automatically the output HTML?

도움이 되었습니까?

해결책

You can reconstruct the DOM using a headless web browser. Here's an example: Phantom JS.

Alternatively, you could use Selenium to script an actual web browser.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top