Question

Is there a way to backreference a previous string in parslet similarly to the \1 functionality in typical regular expressions ?

I want to extract the characters within a block such as:

Marker SomeName
 some random text, numbers123
 and symbols !#%
SomeName  

in which "Marker" is a known string but "SomeName" is not known a-priori, so I believe I need something like:

rule(:name) { ( match('\w') >> match('\w\d') ).repeat(1) } 
rule(:text_within_the_block) {
 str('Marker') >>  name >> any.repeat.as(:text_block) >> backreference_to_name 
}  

What I don't know is how to write the backreference_to_name rule using Parslet and/or Ruby language.

Was it helpful?

Solution

From http://kschiess.github.io/parslet/parser.html

Capturing input

Sometimes a parser needs to match against something that was already matched against. Think about Ruby heredocs for example:

  str = <-HERE
    This is part of the heredoc.
  HERE

The key to matching this kind of document is to capture part of the input first and then construct the rest of the parser based on the captured part. This is what it looks like in its simplest form:

  match['ab'].capture(:capt) >>               # create the capture
    dynamic { |s,c| str(c.captures[:capt]) }  # and match using the capture

The key here is that the dynamic block returns a lazy parser. It's only evaluated at the point it's being used and gets passed it's current context to reference at the point of execution.

-- Updated : To add a worked example --

So for your example:

require 'parslet'    
require 'parslet/convenience'

class Mini < Parslet::Parser
    rule(:name) { match("[a-zA-Z]") >> match('\\w').repeat }
    rule(:text_within_the_block) {  
         str('Marker ') >>  
         name.capture(:namez).as(:name) >> 
         str(" ") >> 
         dynamic { |_,scope| 
            (str(scope.captures[:namez]).absent? >> any).repeat 
         }.as(:text_block) >> 
         dynamic { |src,scope| str(scope.captures[:namez])  } 
     }

    root (:text_within_the_block)
end
puts Mini.new.parse_with_debug("Marker BOB some text BOB") .inspect 
 #=> {:name=>"BOB"@7, :text_block=>"some text "@11}

This required a couple of changes.

  • I changed rule(:name) to match a single word and added a str(" ") to detect that word had ended. (Note: \w is short for [A-Za-z0-9_] so it includes digits)
  • I changed the "any" match to be conditional on the text not being the :name text. (otherwise it consumes the 'BOB' and then fails to match, ie. it's greedy!)

OTHER TIPS

I don't exactly want to support stackoverflow, but as you seem to be a parslet user, here goes: Try asking on the mailing list for a real nice answer. (http://dir.gmane.org/gmane.comp.lang.ruby.parslet)

What you call back-reference here is called a 'capture' in parslet. Please see the example 'capture.rb' in parslets source tree.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top