You could use
replaceAll("(?s)(<Text>.*?</Text>)|\\p{C}", "$1")
The idea is to skip Text
tags contents and leave them alone (replace them with themselves). So if we encounter a \\p{C}
, we know it's not inside one.
Explanation:
(?s)
activates "dot match all", so.
will match newline as well(<Text>.*?</Text>)
captures the text node in the first group. We replace with the result of this capture through$1
- If we match
\\p{C}
, this means we are not in a Text node. So we replace with$1
, which is empty since(<Text>.*?</Text>)
didn't match in the alternation.
Ideone illustration: http://ideone.com/xKZgsn