Splitting by regex or ebnf

https://stackoverflow.com/questions/16533505

29-05-2022
|

문제

I've got a string like:

create Person +fname : String, +lname: String, -age:int;

Is there any possibility to split it by regex or ebnf? I mean all the things like [a-zA-Z0-9] (things we don't know) will be stored in array?

In other words, by using this regexp:

^create [a-zA-Z][a-zA-Z0-9]* [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*(, [s|b]?[+|[-]|=][a-zA-Z][a-zA-Z0-9]*[ ]?:[ ]?[a-zA-Z][a-zA-Z0-9]*)*;

I want to obtain array:

Person
+
fname
String
+
lname
String
-
age
int

해결책

You can try to split it this way

String[] tokens = "create Person +fname : String, +lname: String, -age:int;"
        .split("[\\s:;,]+|(?<=[+\\-])");
        //split on set of characters containing spaces:;, OR after + or -. 
for (String s : tokens)
    System.out.println("=> " + s);

output:

=> create
=> Person
=> +
=> fname
=> String
=> +
=> lname
=> String
=> -
=> age
=> int

As you can see it will put create at start of your array so just start iterating from tokens[1].

You could try do add ^create\\s as part of splitting rule, but this will produce empty string at start of tokens array, so won't solve anything.

다른 팁

Regex is fine for lots of things, but sometimes you need a real lexer. JFlex is great. There's no tokenization task it can't handle. If you need to go a little further and create a parse tree, JavaCC or ANTLR are good choices.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow