One of my REST APIs is expecting a property "url" which expects a URL as input from the user. I am using ESAPI to prevent from XSS attacks. The problem is that the user supplied URL is something like

http://example.com/alpha?abc=def&phil=key%3dbdj

The cannonicalize method from the ESAPI encoder throws intrusion exception here claiming that the input has mixed encoding, since it is url encoded and the piece '&phi' is treated as HTML encoded and thus the exception.

I had a similar problem with sanitizing one of my application urls where the second query parameter started with 'pa' or 'pi' and was converted to delta or pi characters by HTML decoding. Please refer to my previous Stackoverflow question here

Now since the problem is that since the entire URL is coming as input from the user, I cannot simply parse out the Query parameters and sanitize them individually, since malicious input can be created combining the two query parameters and sanitizing them individually wont work in that case.

Example: &ltscr comes is last part of first query param value and ipt&gtalert(0); or something comes as first part of the next query param control context.

Has anyone faced a similar problem? I would really like to know what solutions you guys implemented. Thanks for any pointers.

EDIT: The below answer from 'avgvstvs' does not throw the intrusion exception (Thanks!!). However, the cannonicalize method now changes the original input string. ESAPI treats &phi of the query param to be some html encoded char and replaces it to '?' char. Something like my previous question which is linked here. The difference being that was a URL of my application whereas this is user input. Is my only option maintaining a white list here?

有帮助吗?

解决方案

The problem that you're facing here, is that there are different rules for encoding different parts of a URL--to memory there's 4 sections in a URL that have different encoding rules. First, understand why in Java, you need to build URLs using the UriBuilder class. The URL specification will help with nitty-gritty details.

Now since the problem is that since the entire URL is coming as input from the user, I cannot simply parse out the Query parameters and sanitize them individually, since malicious input can be created combining the two query parameters and sanitizing them individually wont work in that case.

The only real option here is java.net.URI.

Try this:

URI dirtyURI = new URI("http://example.com/alpha?abc=def&phil=key%3dbdj");

String cleanURIStr = enc.canonicalize( dirtyURI.getPath() );

The call to URI.getPath() should give you a non-percent encoded URL, and if enc.canonicalize() detects double-encoding after that stage then you really DO have a double-encoded string and should inform the caller that you will only accept single-encoded URL strings. The URI.getPath() is smart enough to use decoding rules for each part of the URL string.

If its still giving you some trouble, the API reference has other methods that will extract other parts of the URL, in the event that you need to do different things with different parts of the URL. IF you ever need to manually parse parameters on a GET request for example, you can actually just have it return the query string itself--and it will have done a decoding pass on it.

=============JUNIT Test Case============

package org.owasp.esapi;

import java.net.URI;
import java.net.URISyntaxException;

import org.junit.Test;

public class TestURLValidation {

    @Test
    public void test() throws URISyntaxException {
        Encoder enc = ESAPI.encoder();
        String input = "http://example.com/alpha?abc=def&phil=key%3dbdj";
        URI dirtyURI = new URI(input);
        enc.canonicalize(dirtyURI.getQuery());
        
    }

}

=================Answer for updated question=====================

There's no way around it: Encoder.canonicalize() is intended to reduce escaped character sequences into their reduced, native-to-Java form. URLs are most likely considered a special case so they were most likely deliberately excluded from consideration. Here's the way I would handle your case--without a whitelist, and it will guarantee that you are protected by Encoder.canonicalize().

Use the code above to get a URI representation of your input.

Step 1: Canonicalize all of the URI parts except URI.getQuery() Step 2: Use a library parser to parse the query string into a data structure. I would use httpclient-4.3.3.jar and httpcore-4.3.3.jar from commons. You'll then do something like this:

import java.net.URI;
import java.net.URISyntaxException;
import java.util.Iterator;
import java.util.List;

import javax.ws.rs.core.UriBuilder;

import org.apache.http.client.utils.URLEncodedUtils;
import org.junit.Test;
import org.owasp.esapi.ESAPI;
import org.owasp.esapi.Encoder;

public class TestURLValidation
{

  @Test
  public void test() throws URISyntaxException {
    Encoder enc = ESAPI.encoder();
    String input = "http://example.com/alpha?abc=def&phil=key%3dbdj";
    URI dirtyURI = new URI(input);
    UriBuilder uriData = UriBuilder.fromUri(enc.canonicalize(dirtyURI.getScheme()));
    uriData.path(enc.canonicalize(enc.canonicalize(dirtyURI.getAuthority() + dirtyURI.getPath())));
    println(uriData.build().toString());
    List<org.apache.http.NameValuePair> params = URLEncodedUtils.parse(dirtyURI, "UTF-8");
    Iterator<org.apache.http.NameValuePair> it = params.iterator();
    while(it.hasNext()) {
      org.apache.http.NameValuePair nValuePair = it.next();
      uriData.queryParam(enc.canonicalize(nValuePair.getName()), enc.canonicalize(nValuePair.getValue()));
    }
    String canonicalizedUrl = uriData.build().toString();
    println(canonicalizedUrl);
  }

  public static void println(String s) {
    System.out.println(s);
  }
  
}

What we're really doing here is using standard libraries to parse the inputURL (thus taking all the burden off of us) and then canonicalizing the parts after we've parsed each section.

Please note that the code I've listed won't work for all url types... there are more parts to a URL than scheme/authority/path/queries. (Missing is the possibility of userInfo or port, if you need those, modify this code accordingly.)

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top