Question

I tried to find this in the relevant RFC, IETF RFC 3986, but couldn't figure it.

Do URIs for HTTP allow Unicode, or non-ASCII of any kind?

Can you please cite the section and the RFC that supports your answer.

NB: For those who might think this is not programming related - it is. It's related to an ISAPI filter I'm building.


Addendum

I've read section 2.5 of RFC 3986. But RFC 2616, which I believe is the current HTTP protocol, predates 3986, and for that reason I'd suppose it cannot be compliant with 3986. Furthermore, even if or when the HTTP RFC is updated, there still will be the issue of rationalization - in other words, does an HTTP URI support ALL of the RFC3986 provisos, including whatever is appropriate to include non US-ASCII characters?

Was it helpful?

Solution

No, they are not allowed. Just check the ABNF in RFC 3986.

OTHER TIPS

Here is an example: ☃.net.

In terms of the relevant section of RFC 3986, I think you are looking at 2.5.

EDIT:

Apparently stack overflow doesn't detect this as a proper URL. You'll have to copy&paste into your browser.

Used to be that non english characters were not allowed in DNS and URL/URI. There was a hack to allow them by using % encoding in URI. However many countries such us russia and china are starting to implement DNS using non latin characters. Here is a reference to one of these standards

RFC 3986 is being replaced with RFC 3987, which fully supports Unicode, and provides mappings rules to/from RFC 3986 style URIs.

Many browsers are not support URIs with Unicode characters (I've implemented them on a website I've build called -- blogvani.com) and Google duly scans and keeps them intact. I don't think that works on top-level domains though, at least not with the registrar and not directly.

For top-level domains if you have a domain registered in Unicode (for example people can register domains in Hindi), it will be converted to a corresponding code in ASCII (something that may go like jdhfks3243-32434.com)...

It is quite funny to see how this is routed and to realize that you're not actually going to a unicode domain even though it seems like that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top