Question

Would the following 2 canonical link tags be viewed by spiders as pointing to the same URL?

<link rel="canonical" href="http://www.example.com/&#375;" /> - encoded
<link rel="canonical" href="http://www.example.com/ŷ" /> - unencoded

Was it helpful?

Solution

&#375; is an HTML entity that represents the Unicode character with code point 375 in decimal notation. In hexadecimal it'd be 0x177 so we are talking about U+0177 which is ŷ.

That means that both URLs are exactly the same if:

  1. They're displayed in the context of an HTML document.
  2. The document declares a proper character set that supports such symbol and the editor you used to type it inserted the right code.

If the browser displays ŷ in both cases it's likely that character set is correct but you should make sure it is.

OTHER TIPS

Not 100% sure, but I think they both would point to the same URL. But keep in mind, that looking at W3 standards, they often suggest links to be encoded.

if you communicate your HTML as UTF-8 the url is seen as the same.

Even though you can expect it to work in modern browsers, http://www.example.com/ŷ is an invalid URL.

You should always percent encode unicode characters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top