One-page websites ~ Best practices & way to go regarding indexing and social media sharing

https://stackoverflow.com/questions/22178707

03-06-2023
|

Вопрос

Before i ask my question here's a meta-question: Am i allowed to post this question here? It's regarding best practices, but the problem is rather specific. And i don't see a better StackExchange site to ask this on.

I'm about to begin developing a sort-of-single-page-website. It will consist of collection views of articles etc. and more detailed views of said articles which will be loaded asynchronous in an overlay. So it's not a true async one-page site and i will not be using angular or sorts, but for the sake of simplicity i will be referring to it as a one-page site.

The emerging problem is that the site will rely heavily on social media sharing and needs to be indexed properly by search engine crawlers.

Needless to say that when i'm reading an interesting article in an async-loaded overlay and i decide to copy paste the URL (which would be http://www.onepagesite.com) on a social network; Stuff won't work.

I was looking at escaping parts of the url so that overlay pages could be linked to. As in browsing to the Article 1 overlay would rewrite the clients url to http://www.onepagesite.com/#!article/1

However, now the crawling becomes a problem because bots (facebook, google etc.) dont execute js and dont wait for asynchronous content to be loaded.

They do have a unique user agent. So one solution would be to sniff the user agent and feed the bots a different html file than human clients.

Another solution i could think of was to make the URL an actual state of the application, explained by the following example:

when a user browses to an article, then that article will be loaded asynchronous and the url will be rewritten to http://www.onepagesite.com/#!article/1
Then, when this url is requested directly by a user or bot (because it has been shared or whatever) then the backend will spit out all the html needed to produce that state of the app in a single synchronous load, so that bots can index the page and the user will be able to continue browsing in an asynchronous fashion.

This solution requires the factory / page-rendering part of the backend to work in exactly the same way as the client-side javascript rewrites the urls.

Now this is what i, as an absolute novice in one-page sites, could see working as a possible solution.

So here are my questions:

IS this a viable solution, and if so, are there things that should be done differently or are there any major bumps in the road i'm missing?
...or am i totally looking the wrong way and is there a way more obvious and simple way of tackling this project?

I'm still laying out the general structure of my project, so i'm still rather flexible & hence the question

Решение

What you're looking for is making ajax crawlable. Fortunately, Google has figured this out for you.

The short version is that you use hash-bang fragments like you've shown (#!article/1), and when crawling the bot asks you for that (using the part after the #! as a query parameter called __escaped_fragment) and you return the relevant content. Details in the link.

The example they give all have key=value style fragments, so for instance:

http://example.com#!article=21

...which the bot asks for like this:

http://example.com?_escaped_fragment_=article=21

...and you return the content of the article to the bot.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow