Use Backbone Router to deal with Hashbang for SEO content indexing?

Question 1

I ended up stumbling through the implementation as I've outlined in my questions. So...

(1) Yes, the approach seems to work rather well. The only downside is that even though the app works without hash-bangs, my sitemap.xml is full of hashbang URLs. This is necessary to tip-off Google to the fact that it should query the _escaped_fragment_ URL when crawling these pages. So when the site appears in Google search results there is a hashbang in the URL, but that's a small price to pay.

(2) This part was a lot easier than I had imaged. It only required one line of code before initializing the Backbone.js router...

window.location.hash = window.location.hash.replace(/#!/, '#');

var AppRouter = Backbone.Router.extend({...

After the hashbang is replaced with just a hash, the backbone router will automatically remove the hash for browsers that support pushState. Furthermore, those two URL state changes are not saved in the browser's history state, so if the user clicks the back button there is no weirdness/unexpected redirects.

UPDATE: A better approach

It turns out that there is a dead simple approach which completely does away with hashbangs. Via BromBone:

If your site is using hashbangs (#!) urls, then Google will crawl your site by replacing #! with ?escaped_fragment=. When you see ?escaped_fragment=, you'll know the request is from a crawler. If you're using html5 pushState, then you look at the "UserAgent" header to determine if the request is from a bot.

This is a modified version of BromBone's suggested .htaccess rewrite rules:

    RewriteEngine On
    RewriteCond $1 !\.(gif|jpe?g|png)$ [NC]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{HTTP_USER_AGENT} .*Googlebot.* [OR]
    RewriteCond %{HTTP_USER_AGENT} .*Bingbot.* [OR]
    RewriteCond %{HTTP_USER_AGENT} .*Baiduspider.* [OR]
    RewriteCond %{HTTP_USER_AGENT} .*iaskspider.*
    RewriteRule ^(.*)$ snapshot.php/$1 [L]

Question 2

Let me summarize something I wrote about 10 pages in my upcoming book on SPA. Google wants a classic version of your site. This is also an advantage because obsolete browsers really cant do SPA effectively anyway. Serve the spiders and old browsers a core site.

I get the term from the Gaurdian newspaper, http://vimeo.com/channels/smashingconf.

In the browser check if the browser cuts the mustard, here is my script for doing this:

<script>

    if (!('querySelector' in document)
         || !('localStorage' in window)
         || !('addEventListener' in window)
        || !('matchMedia' in window)) {

        if (window.location.href.indexOf("#!") > 0) {
            window.location.href = window.location.href.replace("#!", "?_escaped_fragment_=");
        } else {
            if (window.location.href.indexOf("?_escaped_fragment_=") < 0) {
                window.location.href = window.location.href + "?_escaped_fragment_=";
            }
        }

    } else {

        if (window.location.href.indexOf("?_escaped_fragment_=") >= 0) {
            window.location.href = window.location.href.replace("?_escaped_fragment_=", "#!");
        }
    }

</script>

On the server you need some mechanism to check for the presence of the _escape_fragment_ querystring. If it is present you need to serve the core site. The core site only uses simple CSS and little or no JavaScript. I have a SPAHelper library for ASP.NET MVC you can check out to see some things I implements around this, https://github.com/docluv/spahelper.

The real issue is most server-side web frameworks like ASP.NET, PHP, etc are not designed to support a single view system for the client and server. So you are sort of stuck maintaining two views for this. Again I wrote about 10 pages around this topic for my book, which should be ready sometime next week.