Pergunta

I have an HTML email message that I parse using Jsoup :-

Jsoup.parse(bizmsg.getMessageBody()).text()

But it can't remove script tags :-

<script>
document.write("Bazinga!")
</script>

I have been using regex like this :-

String(v).replace(/(?:<script.*?>)((\n|\r|.)*?)(?:<\/script>)/ig, "");

to successfully remove scripts. But I came across this question JSoup to parse <script> tag

How do I use Rhino to parse scripts ? Code-Sample would be very helpful, thanks.

Foi útil?

Solução

You don't need to use Rhino to remove <script> tags. Use simple CSS selectors in JSoup and remove the obtained nodes. Here a minimal example on www.google.com

public static void main(String[] args) throws MalformedURLException, IOException {
    Document doc = Jsoup.parse(new URL("http://www.google.com"),5000);
    Elements elems = doc.select("script");
    for (Element elem : elems)
        elem.remove();
    System.out.println(doc);

}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top