try this
s = s.replaceAll("<table.+/table>", "");
Question
I have a string :
0000000000<table blalba>blaalb<tr>gfdg<td>kgdfkg</td></tr>fkkkkk</table>5555
I want to replace the text between table
and /table
with : "", to delete this text to display only 00000000005555.
When it is on one line, it works:
chaineHtml = chaineHtml.replaceFirst("[^<title>](.*)[</title>$", "");
But the same with table
fails.
Solution 2
try this
s = s.replaceAll("<table.+/table>", "");
OTHER TIPS
This regex should work:
html = html.replaceAll("(?is)<table.+?/table>", "");
Where (?is)
will make it match across multiple lines and ignore case.
But I suggest you should not manipulate HTML using regex as it can be error prone.
[^<table>]
I don't think that means what you think it means.
It is not "a string not equal to <table>". Rather, it means "a character not equal to < or t or a or b or l or e or >". "[^...]" is called a negative character class.
Change your regex to
(.*?)<table>.*?</table>(.*?)
replace it with
$1$2
and it will give you the result you wish.
Please consider bookmarking The Stack Overflow Regular Expeession FAQ for future reference. The bottom section contains a list of online regex testers where you can try things out yourself. You may also want to check out the sections named "Character Classes" and, as mentioned by @anubhava: "General Information > Do not use regex to parse HTML"
Don't use regex if you are not familiar with its concepts!
There is a simple plain java solution for your problem:
String begin = "<table";
String end = "</table>";
String s = "0000000001<table blalba>blaalb<tr>gfdg<td>kgdfkg</td></tr>fkkkkk</table>4555";
int tableIndex = s.indexOf(begin);
int tableEndIndex = s.indexOf(end, tableIndex);
while (tableIndex > -1) {
s = s.substring(0, tableIndex) + s.substring(tableEndIndex + end.length());
tableIndex = s.indexOf("<table");
tableEndIndex = s.indexOf("</table>", tableIndex);
}
String resultString = subjectString.replaceAll("<table.*?table>", "");
Explanation:
Match the characters “<table” literally «<table»
Match any single character that is not a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the characters “table>” literally «table>»
Here is a brilliant solution I found somewhere: Using the Regex
[\s\S]
to fit any character, including newlines because it fits any space or non-space characters. So in your case that would give:
s = s.replaceAll("<table[\\s\\S]+/table>", "");
the double backslashes are to escape the backslash.
Another possiblity is
(.|\n)
which is any character (except newline) or newline which gives:
s = s.replaceAll("<table(.|\n)+/table>", "");
For some reason, on my computer, in certain combinations, when I use (.|\n)+
regex runs into a weird loop and goes into a stackoverflow:
Exception in thread "main" java.lang.StackOverflowError at java.lang.Character.codePointAt(Character.java:4668) at java.util.regex.Pattern$CharProperty.match(Pattern.java:3693)
which doesn't happen when I use [\s\S\]+
instead. I have no idea why though.