Canonical tags and UTF8
-
27-09-2019 - |
Question
Would the following 2 canonical link tags be viewed by spiders as pointing to the same URL?
<link rel="canonical" href="http://www.example.com/ŷ" />
- encoded
<link rel="canonical" href="http://www.example.com/ŷ" />
- unencoded
Solution
ŷ
is an HTML entity that represents the Unicode character with code point 375 in decimal notation. In hexadecimal it'd be 0x177 so we are talking about U+0177 which is ŷ
.
- http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
- http://inamidst.com/stuff/unidata/
- http://www.fileformat.info/info/unicode/char/0177/index.htm
That means that both URLs are exactly the same if:
- They're displayed in the context of an HTML document.
- The document declares a proper character set that supports such symbol and the editor you used to type it inserted the right code.
If the browser displays ŷ in both cases it's likely that character set is correct but you should make sure it is.
OTHER TIPS
Not 100% sure, but I think they both would point to the same URL. But keep in mind, that looking at W3 standards, they often suggest links to be encoded.
if you communicate your HTML as UTF-8 the url is seen as the same.
Even though you can expect it to work in modern browsers, http://www.example.com/ŷ
is an invalid URL.
You should always percent encode unicode characters.