download/extract website content with javascript and ajax technology

Question 1

At last... I made a PhantomJS script which exactly does what I need...

It allows to login to a site and then executes the javascripts to show content.

Additionally I added a command to generate a screenshot of the website to make it easier to debug.

Thanks to RolandKrüger and remy which are helped to get a solution.

One may have to change the script a little bit but I think it can help ;)

var page = require('webpage').create();

page.onConsoleMessage = function(msg) {
    console.log(msg);
};

page.open("http://www.somewebsite.com", function(status) {
    if ( status === "success" ) {
        page.evaluate(function() {
              document.querySelector("input[name='MAIL_ADDRESS']").value = "mymail@gmail.com";
              document.querySelector("input[name='PASSWORD']").value = "mypassword";
              document.getElementsByName("LOGIN_FORM_SUBMIT")[0].click();
              console.log("Login submitted!");
        });
        window.setTimeout(function () {
            page.render('screenshot.png');
            var ua = page.evaluate(function () {
                return document.getElementById('AnElementIdOnMyWebsite').innerText;
            });
            console.log(ua);
            phantom.exit();
        }, 5000);
   }
});

Question 2

Webkit based browsers (like Google Chrome or Safari) has built-in developer tools. In Chrome you can open it Menu->Tools->Developer Tools. The Network tab allows you to see all information about every request and response:

In the bottom of the picture you can see that I've filtered request down to XHR - these are requests made by javascript code.

Tip: log is cleared every time you load a page, at the bottom of the picture, the black dot button, left to clear button, will preserve log.

After analyzing requests and responses you can simulate these requests from your web-crawler and extract valuable data. In many cases it will be easier to get your data than parsing HTML, because that data does not contain presentation logic and is formatted to be accessed by javascript code.

Firefox has similar extension, it is called firebug. Some will argue that firebug is even more powerful but I like the simplicity of webkit.