Newick grammar for ParseKit

https://stackoverflow.com/questions/7994385

20-02-2021
|

Question

I'm building a grammar to parse Newick trees using ParseKit for a project I'm working on, and I've gotten this far. It's based on the grammar here: http://en.wikipedia.org/wiki/Newick_format. I'd like to use a grammar for this rather than the existing clunky recursive code I have working now.

However, I'm unsure of how to specify the name and length nodes to account for either empty strings or generalized strings and numbers. I've gotten this far from the examples and on the ParseKit site as well as some skimming of the Bulding Parsers for Java book, but have missed something. Can someone point me in the right direction, please?

Current grammar:

@start = tree+;
tree = subtree ';' | branch ';';
subtree = leaf | internal;
leaf = name;
internal = '(' branchset ')' name;
branchset = branch | branchset ',' branch;
branch = subtree length;
name = *;
length = * | ':' *

Thanks!

--Possible answer:

Maybe these name and length nodes would work. Could anyone confirm?

name = Word | Quoted String;
length = ':' Number;

Solution

Developer of ParseKit here. Your proposed solution at the end is basically correct with one small fix: QuotedString is one word:

name = Word | QuotedString;
length = ':' Number;

Also for future reference: if you would like a 'Wildcard' matcher (what you are trying to do with * above), you can use the builtin parser: Any. That will match any token.

In ParseKit, * is a modifier meaning zero or more.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow