Pattern to allow Percentage symbol while using ESAPI

Question 1

Good option would be to exclude the Percentage codec, if it is not really needed.

To do that, one needs to write own customized encoder implementation that extends ESAPI provided org.owasp.esapi.reference.DefaultEncoder and register that to ESAPI.properties like

ESAPI.Encoder=path.to.ESAPIDefaultEncoderImpl

See below the implementation example.

package path.to;

import java.util.ArrayList;
import java.util.List;

public class ESAPIDefaultEncoderImpl extends org.owasp.esapi.reference.DefaultEncoder
{
private static List<String> codecs;
private static ESAPIDefaultEncoderImpl singletonInstance ;

static
{
    codecs = new ArrayList<String>();
    codecs.add("HTMLEntityCodec ");
    codecs.add("JavaScriptCodec");
    singletonInstance = new ESAPIDefaultEncoderImpl();
}

public static ESAPIDefaultEncoderImpl getInstance()
      {
        return singletonInstance;
      }
      private ESAPIDefaultEncoderImpl()
      {
         super(codecs);
      }
}

In this customized encoder, one should not register the Percentage Codec, but only the ones which are really needed. (To see all ESAPI codecs, visit ESAPI documentation).

Question 2

As was pointed out, your problem isn't your regex, its that the data you're sending via the DefaultEncoder.getValidInput(args...) is containing some form of mixed encoding.

You don't discuss much more about the context, but generally speaking the answer you gave that you accepted is extremely fatally flawed and should not be advice recommended to anybody.

Your input is failing because as Identified, ESAPI will canonicalize your input before passing it to the regex for validation. What canonicalization really offers you is two things, but the MOST important is that ESAPI's implementation will detect multiple-encoding attacks.

What is multiple encoding? It's attempting to defeat input validation by encoding a piece of data multiple times. With percent-encoding, it looks like this:

ORIGINAL INPUT:
<script>alert('xss');</script>

ENCODED ONCE:
%3Cscript%3Ealert(%27xss%27)%3B%3C%2Fscript%3E

ENCODED TWICE:
%253Cscript%253Ealert(%2527xss%2527)%253B%253C%252Fscript%253E

Your answer, where you recommend just turning the percent codec off just introduced a massive security vulnerability to your application where you can no longer detect if an attack is attempting to defeat your input validation routines. Percent encoding is an extremely standard attack technique. There's multiple methods in trying to coerce code into an application that involves multiple encoding techniques.

What you really need to have here is a better discussion on why the inputs your application is handling requires the use of the kind of inputs you're playing with here. What's the ACTUAL use case with some example data of the bigger picture? With what you have right in front of us, the only thing I can do is state clearly that removing the percent codec leaves you vulnerable.

If you want to temporarily validate without canonicalization ESAPI has

Validator.getValidInput(String context, String input, String type, int maxLength, boolean allowNull, boolean canonicalize);

which allows you to temporarily turn canonicalization off.

However, canonicalization is there so that you have some kind of assurance that the input you're handling will be safe to use against a regex.

Question 3

Thanks for you help NaveedS and GauravM.

I was able to figure out the exact issue. It's ESAPI core problem while supporting %.

Before doing actual pattern matching, ESAPI is used to canonicalize the input string.
This canonicalization involves usage of various codecs like javascript codec, HTML code, Percentage codec.
Percentage codec scans the input string for % symbol, and considers this as escape character. It considers next two literals as HEX numbers i.e. in example %123, it considers 12 as Hex, i.e. 18 as decimal and hence UP ARROW symbol as Character equivalent.
Thus, after canocilization, input string transforms to UPARROW3, but UPARROW not being allowed in RegEx ^[\\p{L}\\p{N}:\\-.\\s_&.,$()\\*%]*$, it’s failing.

As a workaround, before passing the string to ESAPI for validation, we can remove all the percentages in the string and append one % at the end. This will perform the same validation.

However, for RegEx like Validator.Email=^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[a-zA-Z]{2,4}$ this work-around won't work.

As an alternative in such exceptional cases, one can write his own RegEx (explicitly allowing percentage in the end segment) like Validator.own.Email=^[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\\.[a-zA-Z%]{3,5}$

Hope this helps.

Question 4

Nirav if your trying with digits then please try with this below regex. (\d*%+\d*)+

it will match your pattern which include % followed or preceded by digits.