Question

I'm beginning to write my own dxf file parser and I've ran across a regex problem. Consider the following text file (which is a snippet of a particular dxf file I'm working on):

http://www.filedropper.com/test_104

I read this file in as a String with:

String s = FileUtils.readFileToString(file);

I then want to use regex to split this string so I get an array of Strings of size two with the LINE entity as the first element and the MTEXT entity as the second. My first thought was to use:

String[] tokens = s.split("\\s{2,2}0");

The problem with this however (which you can test for yourself), is that it returns:

{"", "\nLINE...", "\nMTEXT...", "\n100...", "\n"}

Of course the first and last strings could easily be removed from the array, but if you look at the text file, you'll see that in the MTEXT entity there is

"    0"

i.e, four whitespaces and a 0. I don't want to split on this unfortunately. So my question is: how can I parse this using split and regex to simply obtain the array:

String[] tokens = {"\nLINE...", "\nMTEXT..."}
Was it helpful?

Solution

You can use positive look-ahead.

s.split("\\s\\s0(?=\\nLINE|\\nMTEXT)");

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top