Question

What

My web-app is made dynamic through Google's AngularJS.

I want static versions of my pages to be generated.

Why

Web-scrapers like Google's execute and render the JavaScript; but don't treat the content the same way as their static equivalents.

References:

How

Not sure exactly how—which is why I'm asking—but I want to access the same source that your browser's 'inspect element' presents; rather than the source that: Ctrl+U (View page source) shows.

Once I have a script which renders the page; 'spitting' out the HTML+CSS; I will place those 'generated' files on my web-server. A 'cron' job will then be scheduled to regenerate the files at regular intervals.

These static files will subsequently be served instead of the dynamic ones; when JavaScript is disabled and/or when a scraper 'visits' the site.

Was it helpful?

Solution

Here is one solution, however I very much doubt I'll be able to find a public PaaS cloud which can run it:

import spynner

if __name__=='__main__':
    url = "http://angular.github.com/angular-phonecat/step-10/app/#/phones"
    browser = spynner.Browser()
    browser.create_webview(True)
    browser.load(url, load_timeout=60)
    print browser._get_html()
    # ^ Can pipe this to a file, POST it to my server or return it as a string
    browser.close()

Package: Spynner (on Github)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top