Lexer predicate for XPath 3 comments

https://stackoverflow.com/questions/23118157

04-07-2023
|

Question

I am trying to implement an XPath 3 parser in Antlr 4. In the EBNF given in the XPath specification it makes use of - to indicate that something should be excluded, if I understand correctly, then in Antlr I can use a predicate instead to achieve the same behaviour.

I am struggling with implementing CommentContents from the EBNF, as I am not quite sure how to construct the predicate. This is what I have so far:

/** [2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
 *
 * //any Unicode character, excluding the surrogate blocks, FFFE, and FFFF
 */
 Char : '\u0001'..'\uD7FF' | '\uE000'..'\uFFFD' | '\u10000'..'\u10FFFF' ;


/** [108] CommentContents ::= (Char+ - (Char* ('(:' | ':)') Char*)) */
CommentContents : Char+ { $Char+.text.indexOf("(:") + $Char+.text.indexOf(":)") == 0  } ;

Can someone confirm if I have the predicate for CommentContents correct so that it matches the intention of the EBNF?

Solution

You need {...}? not {...}. Also $Char+.text won't work. I suggest making Java function that does the test, returning boolean then just call it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow