Question

If use in url, non allowed character, for example space:

<a href="pa ge.php">link</a>

and click this link, in browser addres bar I see mysite.com/pa%20ge

okay, and if now I use georgian, (or for example russian) alphabet symbols:

<a href="აბცდ.php">link</a>

In in browser addres bar, I see mysite/აბცდ.php

that is, these non latine alphabet symbols, are not changed, tey are in url "presented" as original view.

question: Why? non latine alphabet symbols are also allowed in url ?

Was it helpful?

Solution

No, a URL can only contain (a subset of) ASCII.

The browser is converting "აბცდ" into percentage-encoded entities for the actual URL that is sent to the server. In fact, you should be embedding it as percentage encoded string into your document to begin with, the browser is just covering that mistake for you.

What the browser shows in the address bar is something different. Modern browsers try to be as user friendly as possible and decode some percentage encoded characters to show in the address bar as human readable text. For anti-spoofing reasons, only some are decoded, not all. Georgian happens to be pretty safe, since it's hard to mistake it for any other similar looking characters.

OTHER TIPS

Those characters are internally percent encoded as well, but the browser displays them in their original format as a courtesy to the user. When you copy & paste the URL, you will see the percent encoding is in place:

http://domain.com/mysite.აბცდ.php

becomes

http://domain.com/mysite.%E1%83%90%E1%83%91%E1%83%AA%E1%83%93.php

See this answer for background information.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top