Using ANTLR, how do I handle specific repeats without using language specific semantic predicates?

https://stackoverflow.com/questions/11870878

25-06-2021
|

Question

I am trying to model mqsi commands, using ANTLR and have come across the following problem. The documents for mqsicreateconfigurableservice say for the queuePrefix : "The prefix can contain any characters that are valid in a WebSphere® MQ queue name, but must be no longer than eight characters and must not begin or end with a period (.). For example, SET.1 is valid, but .SET1 and SET1. are invalid. Multiple configurable services can use the same queue prefix."

I've used the following, as a stopgap but this technique implies I must have a minimum of a two character name and seems a very wasteful and non-scalable solution. Is there a better method?

See 'queuePrefixValue', below...

Thanks :o)

parser grammar mqsicreateconfigurableservice;
mqsicreateconfigurableservice
:   'mqsicreateconfigurableservice' ' '+ params
;
params  :   (broker ' '+ switches+)
;
broker  :   validChar+
;
switches
:   AggregationConfigurableService
;
AggregationConfigurableService
:    (objectName ' '+ AggregationNameValuePropertyPair)
;

objectName
:   (' '+ '-o' ' '+ validChar+)
;

AggregationNameValuePropertyPair
:   (' '+ '-n' ' '+ 'queuePrefix' ' '+ '-v' ' '+ queuePrefixValue)?
    (' '+ '-n' ' '+ 'timeoutSeconds' ' '+ '-v' ' '+  timeoutSecondsValue)?
;

// I'm not satisfied with this rule as it means at least two digits are mandatory
//Couldn't see how to use regex or semantic predicates which appear to offer a solution
queuePrefixValue
:   validChar (validChar | '.')? (validChar | '.')? (validChar | '.')? (validChar | '.')? (validChar | '.')? (validChar | '.')? validChar
;
timeoutSecondsValue //a positive integer
:   ('0'..'9')+
;

//This char list is just a temporary subset which eventually needs to reflect all the WebSphere acceptable characters, apart from the dot '.'
validChar
:   (('a'..'z')|('A'..'Z')|('0'..'9'))
;

Solution

You're using parser rules where you should be using lexer rules instead ¹. The . (the dot meta-char) and .. (the range meta-char) behave differently in parser rules as they do in lexer rules. In parser rules, . matches any token (in lexer rules they match any character) and .. will match token-ranges, not character ranges as you're expecting them to match!

So, make queuePrefixValue a lexer rule (let it start with an upper case letter: QueuePrefixValue), and use fragment rules ² where appropriate. Your QueuePrefixValue could look like this:

QueuePrefixValue
 : StartEndCH ((((((CH? CH)? CH)? CH)? CH)? CH)? StartEndCH)?
 ;

fragment StartEndCH
 : 'a'..'z'
 | 'A'..'Z'
 | '0'..'9'
 ;

fragment CH
 : '.'
 | StartEndCH
 ;

So, that more or less answers your question: no, there is no short way to restrict a token to have a certain amount of characters without a language-specific predicate. Note that my suggestion above is not ambiguous (your QueuePrefixValue is ambiguous) and mine also accepts single character values.

HTH

¹ Practical difference between parser rules and lexer rules in ANTLR?

² What does "fragment" mean in ANTLR?

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow