I've got a string like:

create Person +fname : String, +lname: String, -age:int;

Is there any possibility to split it by regex or ebnf? I mean all the things like [a-zA-Z0-9] (things we don't know) will be stored in array?

In other words, by using this regexp:

^create [a-zA-Z][a-zA-Z0-9]* [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*(, [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*)*;

I want to obtain array:

  • Person
  • +
  • fname
  • String
  • +
  • lname
  • String
  • -
  • age
  • int
有帮助吗?

解决方案

You can try to split it this way

String[] tokens = "create Person +fname : String, +lname: String, -age:int;"
        .split("[\\s:;,]+|(?<=[+\\-])");
        //split on set of characters containing spaces:;, OR after + or -. 
for (String s : tokens)
    System.out.println("=> " + s);

output:

=> create
=> Person
=> +
=> fname
=> String
=> +
=> lname
=> String
=> -
=> age
=> int

As you can see it will put create at start of your array so just start iterating from tokens[1].

You could try do add ^create\\s as part of splitting rule, but this will produce empty string at start of tokens array, so won't solve anything.

其他提示

Regex is fine for lots of things, but sometimes you need a real lexer. JFlex is great. There's no tokenization task it can't handle. If you need to go a little further and create a parse tree, JavaCC or ANTLR are good choices.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top