Question

I'm writing an application where I need to parse wine menu. From what i'ce seen so far, they all follow some structure, the trick wil be defining all thosestructures. I'm right now exploring using Parsekit and creating grammars, but the learning curve is pretty steep. Rather than spending the next couple weeks figuring it out and then realizing this is not the right approach, I figured I'd ask.

Any insights/resources people would like to share on parsing those kind of things? Thanks, olivier

Was it helpful?

Solution

Developer of ParseKit here.

(One thing to keep in mind with my answer: although I am the developer of ParseKit, I did not really design the framework or its API. It is based mostly on specific designs found in Steven Metsker's book Building Parsers With Java. I merely ported them to ObjC/Cocoa.)


ParseKit is composed of three parts:

  1. A highly-flexible, high-performance Objective-C Tokenizer (PKTokenizer, PKToken classes)
  2. A thoroughly dynamic Objective-C Parser toolkit for building backtracking, recursive-decent parsers with infinite lookahead (the PKParser class and sublcasses). Due to its dynamism, performance of this parser toolkit is poor for large input.
  3. Objective-C Parser Generation via Grammars - Generate an Objective-C parser for your custom language using a BNF-style grammar syntax (similar to yacc or ANTLR). While parsing, the parser will provide callbacks to your Objective-C code. Due to #2's dynamism, writing grammars is relatively easy, and there are relatively few restrictions on what you can do in the grammars.

Each component above builds upon the prior components. So #3 - the Grammar toolkit - uses both #1 the tokenizer, and #2 the parser toolkit.

If you are doing any serious parsing tasks, I would always recommend checking out #1 - the tokenizer - PKTokenizer. It is incredibly flexible and powerful and performance is very good. If it's easier for you to work on tokens rather than an input string (and it usually is), you'll probably want to check this out.

As for #2 (ObjC Parser toolkit), you'll usually want to just skip it and move to #3, as building parsers via grammar is much nicer than building them via ObjC code.

For #3 (ObjC Parser toolkit via BNF Grammars), the most important consideration is performance. ParseKit's parser toolkit is suitable for parsing relatively small input strings. Some examples might be:

  1. XPath-style query languages
  2. SQL
  3. Relatively consise DSL or command languages
  4. Regular Expressions
  5. Menus (or things than can be broken up into a flat array of relatively small sentences)

ParseKit's parser toolkit is generally not suitable for parsing larger input strings due to performance concerns. Some examples might be:

  1. XML documents
  2. JSON documents

ParseKit can certainly (and does) parse these types of input, but again, performance is poor compared to a dedicated XML or JSON parser due to ParseKit's dynamism (backtracking, infinte lookeahead).


For a "wine menu", I would say, yes - ParseKit is probably a good (possibly great) solution. Especially if you can break the individual lines of input into a array of strings and parse them one by one. Performance should be quite good, and once you get over the learning curve, ParseKit is incredibly powerful/convenient for these types of jobs.

In fact, IIRC, Metsker's original book even uses something like this as an example of a good use of his toolkit.

Hope this helps.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top