Parslet : exclusion clause

https://stackoverflow.com/questions/8223830

06-03-2021
|

Question

I am currently writting a Ruby parser using Ruby, and more precisely Parslet, since I think it is far more easier to use than Treetop or Citrus. I create my rules using the official specifications, but there are some statements I can not write, since they "exclude" some syntax, and I do not know how to do that... Well, here is an example for you to understand...

Here is a basic rule :

foo::=
any-character+ BUT NOT (foo* escape_character barbar*)
# Knowing that (foo* escape_character barbar*) is included in any-character

How could I translate that using Parslet ? Maybe the absent?/present? stuff ?

Thank you very much, hope someone has an idea....

Have a nice day!

EDIT: I tried what you said, so here's my translation into Ruby language using parslet:

  rule(:line_comment){(source_character.repeat >> line_terminator >> source_character.repeat).absent? >> source_character.repeat(1)}

However, it does not seem to work (the sequence in parens). I did some tests, and came to the conclusion that what's written in my parens is wrong.

Here is a very easier example, let's consider these rules:

# Parslet rules
rule(:source_character) {any}
rule(:line_terminator){ str("\n") >> str("\r").maybe }  

rule(:not){source_character.repeat >> line_terminator }
# Which looks like what I try to "detect" up there

I these these rules with this code:

# Code to test : 
code = "test
"

But I get that:

Failed to match sequence (SOURCE_CHARACTER{0, } LINE_TERMINATOR) at line 2 char 1. - Failed to match sequence (SOURCE_CHARACTER{0, } LINE_TERMINATOR) at line 2 char 1.- Failed to match sequence (' ' ' '?) at line 2 char 1. `- Premature end of input at line 2 char 1. nil

If this sequence doesn't work, my 'complete' rule up there won't ever work... If anyone has an idea, it would be great.

Thank you !

Solution

You can do something like this:

rule(:word) { match['^")(\\s'].repeat(1) } # normal word
rule(:op) { str('AND') | str('OR') | str('NOT') }
rule(:keyword) { str('all:') | str('any:') }
rule(:searchterm) { keyword.absent? >> op.absent? >>  word }

In this case, the absent? does a lookahead to make sure the next token is not a keyword; if not, then it checks to make sure it's not an operator; if not, finally see if it's a valid word.

An equivalent rule would be:

rule(:searchterm) { (keyword | op).absent? >> word }

OTHER TIPS

Parslet matching is greedy by nature. This means that when you repeat something like

foo.repeat

parslet will match foo until it fails. If foo is

rule(:foo) { any }

you will be on the path to fail, since any.repeat always matches the entire rest of the document!

What you're looking for is something like the string matcher in examples/string_parser.rb (parslet source tree):

rule :string do
  str('"') >> 
  (
    (str('\\') >> any) |
    (str('"').absent? >> any)
  ).repeat.as(:string) >> 
  str('"')
end

What this says is: 'match ", then match either a backslash followed by any character at all, or match any other character, as long as it is not the terminating ".'

So .absent? is really a way to exclude things from a match that follows:

str('foo').absent? >> (str('foo') | str('bar'))

will only match 'bar'. If you understand that, I assume you will be able to resolve your difficulties. Although those will not be the last on your way to a Ruby parser...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow