Where can I read detailed documentation on defining a grammar for ParseKit?

https://stackoverflow.com//questions/9589803

08-12-2019
|

Question

I'm just getting to grips with ParseKit, read the "Basic Grammar Syntax" but it's only a very basic introduction. I'm quickly out of my depth now that I want to set about defining my own grammar. Where do I go from here?

For example, I want to parse a log file in a very custom format. Breaking it down to header, body and footer, this would be my BNF for first line of the header:

<header-line-1> ::= <log-format> <log-id> "," <category> <EOL> 
<log-format> ::= "Type A Logfile" | "Logfile II" | "Some Other Format" 
<log-id> ::= "#" <long-int> 
<category> ::= <some unknown string>

How do I define that, for ParseKit to understand? I've got this far;

@start = header-line-1;
header-line-1 = log-format log-id "," category EOL;
log-format = 'Type A Logfile';
log-id = '#' ; // and then how to specify a long-int?!?
category = char+;
char = 'A' | 'a' | 'B' | 'b' | 'C'; //..etc...   Surely not?!?

I suspect there must be at least a ways to define a range of charachters?

FOr sure, the book quoted by the author of parsekit will probably help me, but would be nice if somebody can help me get going with my own small example, before I dig deeper into the subject. Am only just investigating an idea, just proof of concept.

Solution

Developer of ParseKit here.

Unfortunately, there is no further (good) documentation on ParseKit's grammar syntax. Currently the best resources are:

Steven Metsker's Book Building Parsers in Java. The good news: This will teach you about the design/innards of ParseKit. The bad news: the "Grammar syntax" feature of ParseKit is an additional feature layered on top of ParseKit which I designed and added myself. So it is not described in Metsker's book as his Java library does not have this feature.
The .grammar files in the Test target of the ParseKit Xcode project. There's lots of real-world example grammars here. You can learn a lot by example.
The ParseKit tag here on StackOverflow. I've answered a lot of questions which may be helpful to you.

As for your specific example, here's how I'd probably define it in ParseKit syntax.

@symbolState = '\n'; // Tokenizer Directive
                     // tells tokenizer to treat new line chars as 
                     // individual Symbol tokens rather than whitespace
@start = headerLine*;
headerLine = logFormat logId comma category eol;
logFormat = ('Type' 'A' 'Logfile') | ('Logfile' 'II') | ('Some' 'Other' 'Format');
logId = hash Number;
category = Any+;

comma = ',';
hash = '#';
eol = '\n';

One important thing to keep in mind is that parsing in ParseKit is a two Phase process:

Tokenizing (done by PKTokenizer and altered by Tokenizer Directives in your grammar)
Parsing (done by the parser constructed by the Declarations in your grammar)

So the Parser created by your grammar works on Tokens which have already been tokenized by the Tokenizer. It does not work on either individual chars or long strings composed of multiple tokens.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow