- Don't forget to use
Pattern.quote(token)
(unless non-regex-escapedkw
is guaranteed) - If you're bound to use
replaceAll()
(instead of tokenizing input intotag|text|tag|text|...
and applying replace to texts only, which would've been much simpler and faster) - below code should help
Note that it's not efficient - it matches some empty or already-highlighted spots and thus requires "curing" after substitution, but should treat XML/HTML tags (except CDATA
) properly.
Here's a "curing" function (no null checks):
private static Pattern cureDoubleB = Pattern.compile("<b><b>([^<>]*)</b></b>");
private static Pattern cureEmptyB = Pattern.compile("<b></b>");
private static String cure(String input) {
return cureEmptyB.matcher(cureDoubleB.matcher(input).replaceAll("<b>$1</b>")).replaceAll("");
}
Here's how the replaceAll line should look like:
String txt = "[^<>" + Pattern.quote(token.substring(0, 1).toLowerCase()) + Pattern.quote(token.substring(0, 1).toUpperCase()) +"]*";
highlighted = cure(highlighted.replaceAll("((<[^>]*>)*"+txt+")(((?i)" + Pattern.quote(token) + ")|("+txt+"))", "$1<b>$4</b>$5"));