Question

I'm trying to create a Regex to block all < and > in a String except when used with <select>. Can anyone suggest a Regex for that? I'll be using it with javax.util.Pattern.

I'm trying to write a solution to block the injection attack and XSS attempts through request and URL. For that, I'll be blocking the special characters and character sequences but with some exceptions. One of the exception is that, I have to allow <select> (angle brackets with select in between them) because that is passed into the request legitimately in some of the cases. But all other combinations of angle brackets have to be blocked. And that is the reason of my question.

Was it helpful?

Solution

Pattern p = Pattern.compile(
  "(?<!\\<select)>|<(?!\s*select\s*>)",
  Pattern.CASE_INSENSITIVE);

This will find > not preceded by <select and < not followed by select> allowing it to be case-insensitive.

Now normally I'd check for (legal) white-space around the element ("< select >" is valid) but the lookbehind has issues with that that I'm not really sure how to get around.

OTHER TIPS

This removes < and > characters from a string unless they are part of a <select> like you mentioned:

someString.replaceAll("<(?!select>)|(?<!\\<select)>", "");

I suspect it can be done with a single regex but it may be easier to split it into stages, e.g.:

  1. "@" => "@0"
  2. "<select>" => "@1"
  3. "<" => ""
  4. ">" => ""
  5. "@1" => "<select>"
  6. "@0" => "@"

Note: these are all literal strings not regex patterns. I have arbitrarily chosen "@" as an escape character but it can be anything.

Example: "a <b> c <select> @ d"
step 1
"a <b> c <select> @0 d"
step 2
"a <b> c @1 @0 d"
step 3
"a b> c @1 @0 d"
step 4
"a b c @1 @0 d"
step 5
"a b c <select> @0 d"
step 6
"a b c <select> @ d"

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top