Question

In my application I have localized urls that look something like this:

This question is mainly for Facebook Likes, but I guess I will hit similar problems when I start thinking about search engine crawlers.

What kind of url would you expect as canonical url? I don't want to use the exact english url, because I want that people clicking the link will be forwarded to their own language (browser setting/dependent on IP).

The IP lookup is not something that I want to do on every page hit. Besides that I would need to incorporate more 'state' in my application, because I have to check wether a user has already been forwarded to his own locale, or is browsing the english version on purpose.

I guess it will going to be something like:

http://example.com/something/animals/elephant

or maybe without any language identifier at all:

http://example.com/animals/elephant

but that is a bit harder to implement, bigger chance on url clashes in the future (in the rare case I would get a category called en or de).

Summary

What kind of url would you expect as canonical url? Is there already a standard set for this?

Was it helpful?

Solution

I know this question is a bit old, but I was facing the same issue. I found this:

Different language versions of a single page are considered duplicates only if the main content is in the same language (that is, if only the header, footer, and other non-critical text is translated, but the body remains the same, then the pages are considered to be duplicates).

That can be found here: https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls

From this I can conclude that we should add locales to canonicals.

OTHER TIPS

I did find one resource that recommends not using the canonical tag with localized addresses. However, Google's documentation does not specify and only mentions subdomains in another context.

There is more that that language that you need to think of.

It's typical a tuple of 3 {region, language, property} If you only have one website then you have {region, language} only.

Every piece of content can either be different in this 3 dimensional space, or at least presented differently. But this is the same piece of content so you'd like to centralize managing of editorial signals, promotions, tracking etc etc. Think about search systems - you'd like page rank to be merged across all instances of the article, not spread thinly out.

I think there is a standard solution: Canonical URL

Put language/region into the domain name

example.com
uk.example.com
fr.example.com

Now you have a choice how you attach a cookie for subdomain (for language/region) or for domain (for user tracking)!

On every html page add a link to canonical URL

<link rel="canonical" href="http://example.com/awesome-article.html" />

Now you are done.

There certainly is no "Standard" beyond it has to be an URL. What you certainly do see on many comercial websites is exactly what you describe:

  <protocol>://<server>/<language>/<more-path>

For the "language-tag" you may follow RFCs as well. I guess your 2-letter-abbrev is quite fine.

I only disagree on the <more-path> of the URL. If I understand you right you are thinking about transforming each page into a local-language URL? I would not do that. Maybe I am not the standard user, but I personally like to manually monkey around in URLs, i.e. if the URL shown is http://examle.com/de/tiere/elefant, but I don't trust the content to be translated well I would manually try http://examle.com/en/tiere/elefant -- and that would not bring me to the expected page. And since I also dislike those URLs http://ex.com/with-the-whole-title-in-the-url-so-the-page-will-be-keyworded-by-search-engines my favorite would be to just exchange the <language> part and use generic english (or any other language) for <more-path>. Eg:

If your site is something like Wikipedia, then I would agree to your scheme of translating the <more-part> as well.

Maybe this Google's guidelines can help with your issue: https://support.google.com/webmasters/answer/189077?hl=en

It says that many websites serve users (across the world) with content targeted to users in a certain region. It is advised to use the rel="alternate" hreflang="x" attributes to serve the correct language or regional URL in Search results.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top