Can I use Unicodes in regex engine dk.brics.automaton?
Question
I want to use Unicodes in my regular expressions.
For example, RegExp="\u0061" matches "a". But it seems dk.brics.automaton does not support this. It turned out matching "u0061". I also tried RegExp="\u0061" and RegExp="\\u0061". None would work.
If you have any experience with this tool, could you please give me some solution ?
Thanks!
Solution
Finally, I found a way to circumvent this issue.
First, we can use Unicodes in the Java code, but it has to be created individually. E.g. String str = "\u0061"+"b";
While String str = "\u0061b";
does not work well.
Second, if we want read the strings from a text file, like test.txt containing "\u0061b\u0063
", we have to (as far as I know) replace the Unicodes with corresponding symbols manually, because they are mixed. Then we can get String str
with the value "abc".
OTHER TIPS
I have no experiences in dk.brics.automaton, but I guess everthing is said in the FAQ and the JavaDoc for the RegExp Class.
As I understand it, you can use unicode characters, but you have to express them as character ("a") and not with the \u0061
notation.