Question

Problem:

Cannot Define a Range of Characters using ParseKit's Grammar.
(Letters Ranging from A to Z with or without Capitalization).


Context:

I am Generating a Parser using ParseKit's ParserGenApp.
The Parser is for a .Framework and functions as a Custom Language Parser.
The Parser is to mimic a working Parser that was Generated using Antlr4 for Java.


Examples:

Antlr4: CHAR : [a-zA-Z];
Regular Expression: "[a-zA-Z]" which would detect any Letter from 'a' to 'z' including Capitalizations Alternate Regular Expression: "[c-xB-Y]"


Questions:

How can I Grammatically define a Range of Characters using the ParserGenApp?
Is there a Grammatical Directive in ParseKit like 'Word' or 'Number' for Characters/Letters?
Is there an Alternative way to declare a Range?

Was it helpful?

Solution

Creator of ParseKit here.

Context:

ParseKit is currently undergoing a bit of a redesign. There are two ways to use ParseKit: the old way and the new way.

OLD WAY (dynamic): Previously, ParseKit produced dynamic, non-deterministic parsers at runtime (this code is still available in the library). Producing these parsers was slow, and the parsers produced were very slow as well (although they had some interesting properties which are useful in very rare circumstances).

NEW WAY (static): Now, using the ParserGenApp (as you've described here), ParseKit produces static ObjC source code for deterministic (PEG) memoizing parsers. Source code is produced at design time which you can then compile into your project. The parsers produced are fast. This new option is now the preferred method of using ParseKit. The old method will be deprecated somehow.

I will assume you are using the new static method (it sounds like you are already from your question).

Answer:

Here are two ways to match "words" in your ParseKit grammars:

  1. Use the built-in Word rule reference which is equivalent to [_a-zA-Z][-'_a-zA-Z0-9]*, and will usually do what you want:

    username = Word;
  2. If the built-in Word terminal does not match exactly what you want, prefix it with a Syntactic Predicate (idea/syntax stolen from ANTLR3) containing a Regex in the MATCHES() macro:

    username = { MATCHES(@"[c-xB-Y]", LS(1)) }? Word;

    OR:

    username = { MATCHES_IGNORE_CASE(@"[a-z]", LS(1)) }? Word;

Notes:

  1. The Syntactic Predicate syntax is { ... }? placed just before a rule reference (Word in this case).
  2. You may use any ObjC code inside the Syntactic Predicate, but it must return a boolean value. MATCHES(), MATCHES_IGNORE_CASE(), and LS() are just C macros I have made available for convenience.
  3. If the ObjC code inside the Predicate is longer than a single expression, you must use semicolons to terminate statements as normal. Remember to return a boolean value.
  4. The MATCHES() and MATCHES_IGNORE_CASE() macros are a shortcut for using NSRegularExpression.
  5. The LS() macro stands for L ookahead S tring. LS(1) means fetch the NSString value of the first lookahead token. In this case the first lookahead token will be the token matched by Word. To look ahead two or three tokens, you would use LS(2), LS(3) and so forth.
  6. The handy old Regex literal syntax is not (yet) available in grammars used with the new static ParseKit (aka ParserGenApp) as it was in the old dynamic ParseKit. I would like to add that eventually.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top