Pregunta

Clearly % needs to be encoded. The wikipedia article on the standard says:

Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.

Why isn't it also listed as a reserved character? Clearly it is reserved to signify something special in the context of a URI...

¿Fue útil?

Solución

The "reserved" characters are intended to be available as delimiters between different parts of a URI. The percent-sign isn't used for that — can't be used for that — because of its use in percent encoding.

It may help clarify matters to point out that there's a separate list of "unreserved" characters, and the percent-sign is not one of those, either:

      unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"

(from http://www.ietf.org/rfc/rfc3986.txt, bottom of page 12). In other words, in the context of URIs, "reserved" has a more specific meaning than one might expect. :-)

Otros consejos

The reserved characters are ones that have some special meaning in a URI and therefore need to be escaped in some way if they are used for something other than their special purpose.

The percent character does not have a special meaning in a URI -- which is what makes it a good choice for an escape/encoding character.

The fact that it is being used to do encoding is the only reason that percent itself needs to be escaped, by percent-encoding it.

This is similar to character escaping, where backslash \ has to itself be escaped \\ only because it was the character chosen to do the initial escaping as in \t or \n

The percent sign is already reserved through its involvement in the grammar rule pct-encoded. Also, this paragraph seems enlightening on the subject:

A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols. A reserved subset of those characters may be used to delimit syntax components within a URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each component's identifying data.

This suggests that the percent symbol itself is indeed reserved for percent encoding (as it does not delimit syntax components within a URI). Your original interpretation is correct, I think it's just a matter of semantics.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top