Question

From Parsekit: how to match individual quote characters?

If you define a parser:

@start = int;
int = /[+-]?[0-9]+/

Unfortunately it isn't going to be parsing any integers prefixed with a "+", unless you include:

@numberState = "+" // at the top.

In the number parse above, the "Symbol" default parser wasn't even mentioned, yet it is still active and overrides user defined parsers.

Okay so with numbers you can still fix it by adding the directive. What if you're trying to create a parser for "++"? I haven't found any directive that can make the following parser work.

@start = plusplus;
plusplus = "++";

The effects of default parsers on the user parser seems so arbitrary. Why can't I parse "++"?

Is it possible to just turn off default Parsers altogether? They seem to get in the way if I'm not doing something common.

Or maybe I've got it all wrong.

EDIT:

I've found a parser that would parse plus plus:

@start = plusplus;
plusplus = plus plus;
plus = "+";

I am guessing the answer is: the literal symbols defined in your parser cannot overlap between default parsers; It must be contained completely by at least once of them.

Was it helpful?

Solution

Developer of ParseKit here.

I have a few responses.

  1. I think you'll find the ParseKit API highly elegant and sensible, the more you learn. Keep in mind that I'm not tooting my own horn by saying that. Although I built ParseKit, I did not design the ParseKit API. Rather, the design of ParseKit is based almost entirely on the designs found in Steven Metsker's Building Parsers In Java. I highly recommend you checkout the book if you want to deeply understand ParseKit. Plus it's a fantastic book about parsing in general.

  2. You're confusing Tokenizer States with Parsers. They are two distinct things, but the details are more complex than I can answer here. Again, I recommend Metsker's book.

  3. In the course of answering your question, I did find a small bug in ParseKit. Thanks! However, it was not affecting your outcome described above as you were not using the correct grammar to get the outcome it seems you were looking for. You'll need to update your source code from The Google Code Project now, or else my advice below will not work for you.


Now to answer your question.

I think you are looking for a grammar which both recognizes ++ as a single multi-char Symbol token and also recognizes numbers with leading + chars as explicitly-positive numbers rather than a + Symbol token followed by a Number token.

The correct grammar I believe you are looking for is something like this:

@symbols = '++';    // declare ++ as a multi-char symbol
@numberState = '+'; // allow explicitly-positive numbers
@start = (Number|Symbol)*;

Input like this:

++ +1 -2 + 3 ++

Will be tokenized like so:

[++, +1, -2, +, 3, ++]++/+1/-2/+/3/++^

Two reminders:

  1. Again, you will need to update your source code now to see this work correctly. I had to fix a bug in this case.
  2. This stuff is tricky, and I recommend reading Metsker's book to fully understand how ParseKit works.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top