Question

I've got a string like:

create Person +fname : String, +lname: String, -age:int;

Is there any possibility to split it by regex or ebnf? I mean all the things like [a-zA-Z0-9] (things we don't know) will be stored in array?

In other words, by using this regexp:

^create [a-zA-Z][a-zA-Z0-9]* [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*(, [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*)*;

I want to obtain array:

  • Person
  • +
  • fname
  • String
  • +
  • lname
  • String
  • -
  • age
  • int
Was it helpful?

Solution

You can try to split it this way

String[] tokens = "create Person +fname : String, +lname: String, -age:int;"
        .split("[\\s:;,]+|(?<=[+\\-])");
        //split on set of characters containing spaces:;, OR after + or -. 
for (String s : tokens)
    System.out.println("=> " + s);

output:

=> create
=> Person
=> +
=> fname
=> String
=> +
=> lname
=> String
=> -
=> age
=> int

As you can see it will put create at start of your array so just start iterating from tokens[1].

You could try do add ^create\\s as part of splitting rule, but this will produce empty string at start of tokens array, so won't solve anything.

OTHER TIPS

Regex is fine for lots of things, but sometimes you need a real lexer. JFlex is great. There's no tokenization task it can't handle. If you need to go a little further and create a parse tree, JavaCC or ANTLR are good choices.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top