Question

Suppose my web application renders the following tag:

<object type="application/x-pdf" data="http://example.com/test%2Ctest.pdf">
     <param name="showTableOfContents" value="true" />
     <param name="hideThumbnails" value="false" />
</object>

Should data attribute be escaped (percent-encoded path) or no? In my example it is. I haven't found any specification.

addendum

Actually, I'm interested in specification on what should browser plugins consuming data attribute expect to see there. For example, Adobe Acrobat plugin takes both escaped and unescaped uri. However, QWebPluginFactory treats data attribute as a human readable URI (unescaped), and that leads to double percent encoding. And I'm wondering whether it is a bug of QWebPluginFactory or not.

Was it helpful?

Solution

The data attribute expects the value to be a URI. So you should provide a value that is a syntactically valid URI.

The current specification of URIs is RFC 3986. To see whether the , in the URI’s path needs to be encoded, take a look at how the path production rule is defined:

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters

Since we have a URI with authority information, we need to take a look at path-abempty (see URI production rule):

path-abempty  = *( "/" segment )

segment is zero or more pchar characters that is defined as follows (I’ve already expanded the production rules):

pchar         = ALPHA / DIGIT / "-" / "." / "_" / "~" / "%" HEXDIG HEXDIG / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / ":" / "@"

And as you can see, pchar expands to a literal ,. So you don’t need to encode the , in the path component. But since you are allowed to encode any non-delimiting character using the percent-encoding without changing its meaning, it is fine to use %2C instead of ,.

OTHER TIPS

URLs generally can only contain specific characters. Unfortunately different specifications contain different lists of characters that are considered reserved and thus can't be used.

In your example the encoded character is a comma (,), which is a reserved character in some specifications, so it's not wrong to encode it.

Most webservers should handle unencoded and encoded commas equaly, however there can be some that don't, depending on their configuration. Due to that it is generally a good idea to avoid having special characters in filenames (as you have in your example) in the first place.

URL encoding is always needed when you have special characters in GET parameters. For example a GET parameter that is support to take C&A as a value has to be written as:

http://example.com/somescript.php?value=C%26A

EDIT:

Plugins (or even the browser) don't care either way. They don't try to (or need to) decode it or anything like that. They just request the URL as entered from the server.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top