Can I use Unicodes in regex engine dk.brics.automaton?

https://stackoverflow.com/questions/10026301

29-05-2021
|

Question

I want to use Unicodes in my regular expressions.

For example, RegExp="\u0061" matches "a". But it seems dk.brics.automaton does not support this. It turned out matching "u0061". I also tried RegExp="\u0061" and RegExp="\\u0061". None would work.

If you have any experience with this tool, could you please give me some solution ?

Thanks!

Solution

Finally, I found a way to circumvent this issue.

First, we can use Unicodes in the Java code, but it has to be created individually. E.g. String str = "\u0061"+"b"; While String str = "\u0061b"; does not work well.

Second, if we want read the strings from a text file, like test.txt containing "\u0061b\u0063", we have to (as far as I know) replace the Unicodes with corresponding symbols manually, because they are mixed. Then we can get String str with the value "abc".

OTHER TIPS

I have no experiences in dk.brics.automaton, but I guess everthing is said in the FAQ and the JavaDoc for the RegExp Class.

As I understand it, you can use unicode characters, but you have to express them as character ("a") and not with the \u0061 notation.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow