One of my REST APIs have a query parameter named "partners" which is an List of Integers, so you can specify multiple values in the URL. As a prevention for XSS attacks, I am stripping out malicious content in the input using ESAPI. Here is the problem:

I noticed that the ESAPI encoder cannonicalize method (which uses the default codecs: HTMLEntityCodec,PercentCodec,JavaScriptCodec), changes the query parameter values, because it thinks that &p or &pa is some kind of encoding. See examples below

Something like

http://localhost:8080/product?partner=1

Works as expected.

On the other hand something like

http://localhost:8080/product/?pidentity=1&pidentity=2

The input after canonicalizing becomes

   `pidentity=1πdentity=2`

Which the framework has trouble parsing since it thinks this is only one query parameters with 2 splitters.

If the request url is like

http://localhost:8080/product?partner=1&partner=2

The input after canonicalizing becomes

partner=1∂rtner=2

And &pa is changed to '∂'.

As you can probably guess, I tried changing the name of the query param and it worked fine (probably because there was not any corresponding encoding). Has anyone seen that before, or can guide me what must be causing such behavior? This may sound like my inexperience, but in order to ensure prevention from XSS attacks, I am not sure if I should try to remove any codecs from the default encoder.

有帮助吗?

解决方案

The approach you are currently using is what we currently refer to as the "Big Hammer" approach where you are attempting to encode the entire URL as opposed to encoding the untrusted or tainted data being supplied by an untrusted source (ie, a user)

The best approach to this would be to encode the values of each parameter individually rather than attempting to encode the entire parameter string as a single piece of data. The primary purpose of output encoding is to eliminate the possibility of a user breaking out the the "data" context to a "control" context with the data they are providing.

In your example, the string partner=1&partner=2 looks like this to a parser

partner=1&partner=2

(Where bold is control and italic is data) - you only want to encode the data context of the string since the control context is not provided by the untrusted source.

If a user were your provide the data 1&partner=2 your encoded string should look like

partner=1%26partner=2&partner=2

Another important note here is that canonicalization is used to simplify a given string to it's most base format - so all encoding in the provided string will be decoded so that double and mixed encoding attacks cannot be performed.

The short answer to your question is to encode the values of the parameters individually as opposed to encoding the entire URL parameter string.

References:

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top