I see that many sites (amazon, wikipedia, others) use UTF8-encoded, URL-escaped unicode in their URLs, and those URLs are prettified by (at least) Chrome.

For example, we would represent http://ja.wikipedia.org/wiki/メインページ as http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8 when writing our http headers, and Chrome and Firefox seem to understand this in a graceful way. (I didn't test on IE.)

Is there a governing standard for this behavior? Or is it strictly a de facto standard? Or is it completely non-standard?

I'd really like to see a link to the defining paragraph of some RFC.

没有正确的解决方案

其他提示

The URI standard says:

When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded.

That seems pretty definitive.

I'm still unsure about when it was ratified, or the current browser support.

RFC 3987 is the new standard for handling International URI/URLs, known as IRIs. The old standard, RFC 3986, does not support Unicode. Anyone not using IRIs yet has to come up with their own way of encoding unsupported characters for their own needs. Percent-encoding UTF-8 octets is one way, but it is certainly not the only way that is actually in use.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top