Question

I'm writing a dxf file parser and I want to use String.split() to separate the file (represented as a String) into its individual dxf entities. New dxf entities are distinguished by two leading whitespaces followed by 0. Unfortunately, however, there are some other properties of some entities (such as MTEXT) where a property is distinguished by four leading whitespaces followed by a zero. I want to use split, but I do I distinguish these two cases using regex. A simple split("\s\s0) or split(\s{2,2}0) still allows the four whitespaces. How can I use regex to specify that I want to split on exactly two leading whitespaces, no less, no more.

Was it helpful?

Solution

If the whitespace characters follow a word character, you can use a word boundary anchor \b, like this:

String[] tokens = text.split("\\b\\s{2,2}0");

You can also use a negative lookbehind - it works even when the characters before whitespace are non-word characters:

String[] tokens = text.split("(?<!\\s)\\s{2,2}0");
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top