You can use positive look-ahead.
s.split("\\s\\s0(?=\\nLINE|\\nMTEXT)");
Question
I'm beginning to write my own dxf file parser and I've ran across a regex problem. Consider the following text file (which is a snippet of a particular dxf file I'm working on):
http://www.filedropper.com/test_104
I read this file in as a String with:
String s = FileUtils.readFileToString(file);
I then want to use regex to split this string so I get an array of Strings of size two with the LINE entity as the first element and the MTEXT entity as the second. My first thought was to use:
String[] tokens = s.split("\\s{2,2}0");
The problem with this however (which you can test for yourself), is that it returns:
{"", "\nLINE...", "\nMTEXT...", "\n100...", "\n"}
Of course the first and last strings could easily be removed from the array, but if you look at the text file, you'll see that in the MTEXT entity there is
" 0"
i.e, four whitespaces and a 0. I don't want to split on this unfortunately. So my question is: how can I parse this using split and regex to simply obtain the array:
String[] tokens = {"\nLINE...", "\nMTEXT..."}
Solution
You can use positive look-ahead.
s.split("\\s\\s0(?=\\nLINE|\\nMTEXT)");