Question

I'm new to Owasp and it's HTML sanitizer, and find that with any policy I use, it's unescaping some entities back into characters.

For example this string:

@ test !

gets turned into this:

@ test !

I'd like to leave the entities "as is" as much as possible. I'd even understand it if it was escaping them, and not unescaping them.

So is this possible with the sanitizer? It seems to do it no matter what I try and use for a policy.

Here's the code I'm running for my simple test:

package com.my.company.test;

import org.junit.Test;
import org.owasp.html.PolicyFactory;
import org.owasp.html.Sanitizers;

import junit.framework.TestCase;

public class OwaspSanitizerTest extends TestCase {
  public static final PolicyFactory POLICY = Sanitizers.IMAGES;

  @Test
  public static final void testTextFilter() throws Exception {
      String data = "@ test !";
      String result = POLICY.sanitize(data);

      System.out.println(result);

      assertEquals("@ test !", result);
  }
}

EDIT: The reason I ask is that I want my users inputs to match what we output as much as possible. I understand that this won't be possible in some situations, but would've expected it would be in this case.

Was it helpful?

Solution

The sanitizer decodes text nodes and then re-encodes them to foil encoding-level attacks, and so that it can ensure that the output is as close as possible to the intersection of HTML and XML to minimize the chance that naive post-processors re-introduce vulnerabilities.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top