Question

In an effort to make a DSL I have written backwards-compatible with ruby 1.8 I need to do some (relatively straightforward) parsing on the source strings. I could probably do directly with string munging, but in the interest of future maintainability I wanted to investigate first to see what it would take to use a proper parser generator.

The role of this DSL, however, puts an unusual constraint on what ruby gems I can use. The DSL is part of an Xcode project that's distributed with CocoaPods, and CocoaPods is not really about managing ruby dependencies in the build environment.

What this means is, my ruby DSL is effectively restricted to the gems that ship pre-installed on Mac OS X 10.8.

SO, my question: Is there a ruby parser generator out there that generates "stand-alone" ruby code as its final output? Meaning ruby code that does not require anything that's not part of core ruby?

I have looked at the (sparse) documentation for ANTLR for Ruby, but it (understandably) does not address my question. And from my quick glimpse at treetop, it does seem to use a support package bundled as a gem.

Was it helpful?

Solution

After further searching I came across the rexical gem, which is itself a renamed-and-slightly-maintained version of rex. This is an old-school lexer-generator thats only dependency is on racc/parser, which has been part of ruby-core for long enough that I don't have to worry about it.

The documentation is sparse, but there were enough blog posts touching on the topic that I was able to get what I needed working.

In case you're curious enough to have read this far, here is my example .rex specification:

require 'generator'

class OptionSpecsLexer
rules
  \d+(\.\d*)            { [:number, text] }
  \w+:                  { [:syntax_hash_key, ":#{text[0, text.length - 1]} =>"] }
  \:\w+                 { [:symbol, text] }
  \w+\(                 { [:funcall_open_paren, text] }
  \w+                   { [:identifier, text] }
  \"(\\.|[^\\"])*\"     { [:string, text] }
  =>                    { [:rocket, text] }
  ,                     { [:comma, text] }
  \{                    { [:open_curly, text] }
  \}                    { [:close_curly, text] }
  \(                    { [:open_paren, text] }
  \)                    { [:close_paren, text] }
  \[                    { [:close_square, text] }
  \]                    { [:close_square, text] }
  \\\s+                 { }
  \n                    { [:eol, text] }
  \s+                   { }

inner

  def enumerate_tokens
    Generator.new { |token|
      loop {
        t = next_token
        break if t.nil?
        token.yield(t)
      }
    }
  end

  def normalize(source)
    scan_setup source
    out = ""
    enumerate_tokens.each do |token|
      out += ' ' + token[1]
    end
    out
  end

end

This lexer understands just enough ruby syntax to preprocess specifications written in my vMATCodeMonkey DSL, replacing the new keyword-style hash key syntax with the old rocket operator syntax. [This was done to allow vMATCodeMonkey to work on un-updated Mac OS X 10.8 which still ships with a deprecated version of ruby.]

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top