Question

I believe the definition and implementation of Java's URI.resolve method is incompatible with RFC 3986 section 5.2.2. I understand that the Java API defines how that method works, and if it were changed now it would break existing apps, but my question is this: Can anyone confirm my understanding that this method is incompatible with RFC 3986?

I'm using the example from this question: java.net.URI resolve against only query string, which I will copy here:


I'm trying to build URI's using the JDK java.net.URI. I want to append to an absolute URI object, a query (in String). In example:

URI base = new URI("http://example.com/something/more/long");
String queryString = "query=http://local:282/rand&action=aaaa";
URI query = new URI(null, null, null, queryString, null);
URI result = base.resolve(query);

Theory (or what I think) is that resolve should return:

http://example.com/something/more/long?query=http://local:282/rand&action=aaaa

But what I got is:

http://example.com/something/more/?query=http://local:282/rand&action=aaaa

My understanding of RFC 3986 section 5.2.2 is that if the path of the relative URI is empty, then the entire path of the base URI is to be used:

        if (R.path == "") then
           T.path = Base.path;
           if defined(R.query) then
              T.query = R.query;
           else
              T.query = Base.query;
           endif;

and only if a path is specified is the relative path to be merged against the base path:

        else
           if (R.path starts-with "/") then
              T.path = remove_dot_segments(R.path);
           else
              T.path = merge(Base.path, R.path);
              T.path = remove_dot_segments(T.path);
           endif;
           T.query = R.query;
        endif;

but the Java implementation always does the merge, even if the path is empty:

    String cp = (child.path == null) ? "" : child.path;
    if ((cp.length() > 0) && (cp.charAt(0) == '/')) {
      // 5.2 (5): Child path is absolute
      ru.path = child.path;
    } else {
      // 5.2 (6): Resolve relative path
      ru.path = resolvePath(base.path, cp, base.isAbsolute());
    }

If my reading is correct, to get this behaviour from the RFC pseudocode, you could put a dot as the path in the relative URI, before the query string, which from my experience using relative URIs as links in web pages is what I would expect:

transform(Base="http://example.com/something/more/long", R=".?query")
    => T="http://example.com/something/more/?query"

But I would expect, in a web page, that a link on the page "http://example.com/something/more/long" to "?query" would go to "http://example.com/something/more/long?query", not "http://example.com/something/more/?query" - in other words, consistent with the RFC, but not with the Java implementation.

Is my reading of the RFC correct, and the Java method inconsistent with it, or am I missing something?

Was it helpful?

Solution

Yes, I agree that the URI.resolve(URI) method is incompatible with RFC 3986. The original question, on its own, presents a fantastic amount of research that contributes to this conclusion. First, let's clear up any confusion.

As Raedwald explained (in a now deleted answer), there is a distinction between base paths that end or do not end with /:

  • fizz relative to /foo/bar is: /foo/fizz
  • fizz relative to /foo/bar/ is: /foo/bar/fizz

While correct, it's not a complete answer because the original question is not asking about a path (i.e. "fizz", above). Instead, the question is concerned with the separate query component of the relative URI reference. The URI class constructor used in the example code accepts five distinct String arguments, and all but the queryString argument were passed as null. (Note that Java accepts a null String as the path parameter and this logically results in an "empty" path component because "the path component is never undefined" though it "may be empty (zero length)".) This will be important later.

In an earlier comment, Sajan Chandran pointed out that the java.net.URI class is documented to implement RFC 2396 and not the subject of the question, RFC 3986. The former was obsoleted by the latter in 2005. That the URI class Javadoc does not mention the newer RFC could be interpreted as more evidence of its incompatibility. Let's pile on some more:

  • JDK-6791060 suggests this class "should be updated for RFC 3986". A comment there warns that "RFC3986 is not completely backwards compatible with 2396". It was closed in 2018 as a duplicate of JDK-8019345 (still open and unresolved as of October, 2022, with no notable activity since 2013).

  • Previous attempts were made to update parts of the URI class to be compliant with RFC 3986, such as JDK-6348622, but were then rolled back for breaking backwards compatibility. (Also see this discussion on the JDK mailing list.)

  • Although the path "merge" logic sounds similar, as noted by SubOptimal, the pseudocode specified in the newer RFC does not match the actual implementation. In the pseudocode, when the relative URI's path is empty, then the resulting target path is copied as-is from the base URI. The pseudocode's "merge" logic is not executed under those conditions. Contrary to that specification, Java's URI implementation trims the base path after the last / character, as observed in the question.

There are alternatives to the URI class, if you want RFC 3986 behavior. Java EE 6 through EE 8 implementations provide javax.ws.rs.core.UriBuilder, which (in Jersey 1.18) seems to behave as you expected (see below). It at least claims awareness of the RFC as far as encoding different URI components is concerned. With the switch from JavaEE to JakartaEE 9 (circa 2020), this class moved to jakarta.ws.rs.core.UriBuilder.

Outside of J2EE, Spring 3.0 introduced UriUtils, specifically documented for "encoding and decoding based on RFC 3986". Spring 3.1 deprecated some of that functionality and introduced the UriComponentsBuilder, but it does not document adherence to any specific RFC, unfortunately.


Test program, demonstrating different behaviors:

import java.net.*;
import java.util.*;
import java.util.function.*;
import javax.ws.rs.core.UriBuilder; // using Jersey 1.18

public class StackOverflow22203111 {

    private URI withResolveURI(URI base, String targetQuery) {
        URI reference = queryOnlyURI(targetQuery);
        return base.resolve(reference);
    }
 
    private URI withUriBuilderReplaceQuery(URI base, String targetQuery) {
        UriBuilder builder = UriBuilder.fromUri(base);
        return builder.replaceQuery(targetQuery).build();
    }

    private URI withUriBuilderMergeURI(URI base, String targetQuery) {
        URI reference = queryOnlyURI(targetQuery);
        UriBuilder builder = UriBuilder.fromUri(base);
        return builder.uri(reference).build();
    }

    public static void main(String... args) throws Exception {

        final URI base = new URI("http://example.com/something/more/long");
        final String queryString = "query=http://local:282/rand&action=aaaa";
        final String expected =
            "http://example.com/something/more/long?query=http://local:282/rand&action=aaaa";

        StackOverflow22203111 test = new StackOverflow22203111();
        Map<String, BiFunction<URI, String, URI>> strategies = new LinkedHashMap<>();
        strategies.put("URI.resolve(URI)", test::withResolveURI);
        strategies.put("UriBuilder.replaceQuery(String)", test::withUriBuilderReplaceQuery);
        strategies.put("UriBuilder.uri(URI)", test::withUriBuilderMergeURI);

        strategies.forEach((name, method) -> {
            System.out.println(name);
            URI result = method.apply(base, queryString);
            if (expected.equals(result.toString())) {
                System.out.println("   MATCHES: " + result);
            }
            else {
                System.out.println("  EXPECTED: " + expected);
                System.out.println("   but WAS: " + result);
            }
        });
    }

    private URI queryOnlyURI(String queryString)
    {
        try {
            String scheme = null;
            String authority = null;
            String path = null;
            String fragment = null;
            return new URI(scheme, authority, path, queryString, fragment);
        }
        catch (URISyntaxException syntaxError) {
            throw new IllegalStateException("unexpected", syntaxError);
        }
    }
}

Outputs:

URI.resolve(URI)
  EXPECTED: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
   but WAS: http://example.com/something/more/?query=http://local:282/rand&action=aaaa
UriBuilder.replaceQuery(String)
   MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
UriBuilder.uri(URI)
   MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa

OTHER TIPS

If you want better1 behavior from URI.resolve() and do not want to include another large dependency2 in your program, then I found the following code to work well within my requirements:

public URI resolve(URI base, URI relative) {
    if (Strings.isNullOrEmpty(base.getPath()))
        base = new URI(base.getScheme(), base.getAuthority(), "/",
            base.getQuery(), base.getFragment());
    if (Strings.isNullOrEmpty(uri.getPath()))
        uri = new URI(uri.getScheme(), uri.getAuthority(), base.getPath(),
            uri.getQuery(), uri.getFragment());
    return base.resolve(uri);
}

The only non-JDK thing there is Strings from Guava, for readability - replace with your own 1-line-method if you don't have Guava.

Footnotes:

  1. I cannot claim that the simple code sample here is RFC3986 compliant.
  2. Such as Spring, javax.ws or - as mentioned in this answer - Apache HTTPClient.

for me there is no discrepancy. With the Java behaviour.

in RFC2396 5.2.6a

All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.

in RFC3986 5.2.3

return a string consisting of the reference's path component appended to all but the last segment of the base URI's path (i.e., excluding any characters after the right-most /" in the base URI path, or excluding the entire base URI path if it does not contain any "/" characters).

The solution proposed by @Guss is a good enough work around, but unfortunately, there is a Guava dependency and some minor errors in it.

This is a refactor of his solution removing the Guava dependency and the errors. I use it in replacement of URI.resolve() and place it in a helper class called URIUtils of mine, together with other methods that would be part of an extended URI class if it was not final.

public static URI resolve(URI base, URI uri) throws URISyntaxException {
  if (base.getPath() == null || base.getPath().isEmpty())
    base = new URI(base.getScheme(), base.getAuthority(), "/", base.getQuery(), base.getFragment());
  if (uri.getPath() == null || uri.getPath().isEmpty())
    uri = new URI(uri.getScheme(), uri.getAuthority(), base.getPath(), uri.getQuery(), uri.getFragment());
  return base.resolve(uri);
}

It is easy to check it works around URI.resolve() just by comparing their outputs for some common pitfalls:

public static void main(String[] args) throws URISyntaxException {
  URI host = new URI("https://www.test.com");

  URI uri = new URI("mypage.html");
  System.out.println(host.resolve(uri));
  System.out.println(URIUtils.resolve(host, uri));
  System.out.println();

  uri = new URI("./mypage.html");
  System.out.println(host.resolve(uri));
  System.out.println(URIUtils.resolve(host, uri));
  System.out.println();

  uri = new URI("#");
  System.out.println(host.resolve(uri));
  System.out.println(URIUtils.resolve(host, uri));
  System.out.println();

  uri = new URI("#second_block");
  System.out.println(host.resolve(uri));
  System.out.println(URIUtils.resolve(host, uri));
  System.out.println();
}
https://www.test.commypage.html
https://www.test.com/mypage.html

https://www.test.commypage.html
https://www.test.com/mypage.html

https://www.test.com#
https://www.test.com/#
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top