Question

The RFC 3986 URI: Generic Syntax spec lists a semicolon as a reserved (sub-delim) character:

reserved    = gen-delims / sub-delims

gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

What is the reserved purpose of the ";" of the semicolon in URIs? For that matter, what is the purpose of the other sub-delims (I'm only aware of purposes for "&", "+", and "=")?

Was it helpful?

Solution

There is an explanation at the end of section 3.3.

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.

In other words, it is reserved so that people who want a delimited list of something in the URL can safely use ; as a delimiter even if the parts contain ;, as long as the contents are percent-encoded. In other words, you can do this:

foo;bar;baz%3bqux

and interpret it as three parts: foo, bar, baz;qux. If semi-colon were not a reserved character, the ; and %3bwould be equivalent so the URI would be incorrectly interpreted as four parts: foo, bar, baz, qux.

OTHER TIPS

The intent is clearer if you go back to older versions of the specification:

  path_segments = segment *( "/" segment )
  segment       = *pchar *( ";" param ) 

Each path segment may include a sequence of parameters, indicated by the semicolon ";" character.

I believe it has its origins in FTP URIs.

Section 3.3 covers this - it's an opaque delimiter a URI-producing application can use if convenient:

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents. For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. Parameter types may be defined by scheme-specific semantics, but in most cases the syntax of a parameter is specific to the implementation of the URI's dereferencing algorithm.

There are some conventions around its current usage that are interesting. These speak to when to use a semicolon or comma. From the book "RESTful Web Services":

Use punctuation characters to separate multiple pieces of data at the same level of hierarchy. Use commas when the order of the items matters, ... Use semicolons when the order doesn't matter.

Since 2014 path segments are known to contribute to Reflected File Download attacks. Let's assume we have a vulnerable API that reflects whatever we send to it (the URL was real apparently, now fixed):

https://google.com/s?q=rfd%22||calc||

{"results":["q", "rfd\"||calc||","I love rfd"]}

Now, this is harmless in a browser as it's JSON so it's not going to be rendered but the browser will rather offer to download the response as a file. Now here's the path segments come to help (for the attacker):

https://google.com/s;/setup.bat;?q=rfd%22||calc||

Everything between semicolons (;/setup.bat;) will be not sent to the web service, but instead the browser will interpret it as the file name... to save the API response. Now, a file called setup.bat will be downloaded and run without asking about dangers of running files downloaded from Internet (because it contains the word "setup" in its name). The contents will be interpreted as Windows batch file, and the calc.exe command will be run.

Prevention:

  • sanitize your API's input (in this case they should just allow alphanumerics); escaping is not sufficient
  • add Content-Disposition: attachment; filename="whatever.txt" on APIs that are not going to be rendered; Google was missing the filename part which actually made the attack easier
  • add X-Content-Type-Options: nosniff header to API responses

I found the following use-cases:

Its the final character of a HTML entity:
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

To use one of these character entity references in an HTML or XML document, enter an ampersand followed by the entity name and a semicolon, e.g., & for the ampersand ("&").

Apache Tomcat 7 (or newer versions?!) us it as path parameter:
https://superevr.com/blog/2011/three-semicolon-vulnerabilities

Apache Tomcat is one example of a web server that supports "Path Parameters". A path parameter is extra content after a file name, separated by a semicolon. Any arbitrary content after a semicolon does not affect the landing page of a web browser. This means that http://example.com/index.jsp;derp will still return index.jsp, and not some error page.

URI scheme splits by it the MIME and data:
https://en.wikipedia.org/wiki/Data_URI_scheme

It can contain an optional character set parameter, separated from the preceding part by a semicolon (;) .

<img src="
AAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO
9TXL0Y4OHwAAAABJRU5ErkJggg==" alt="Red dot" />

And there was a bug in IIS5 and IIS6 to bypass file upload restrictions:
https://www.owasp.org/index.php/Unrestricted_File_Upload

Blacklisting File Extensions This protection might be bypassed by: ... by adding a semi-colon character after the forbidden extension and before the permitted one (e.g. "file.asp;.jpg")

Conclusion:
Do not use semicolons in URLs or they could accidentally produce a HTML entity or URI scheme.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top