Making BBcode parser with PEG problem

https://stackoverflow.com/questions/7473570

22-01-2021
|

Question

I am making bbcode parser with PEG (Citrus implementation for Ruby) and I am stuck on parsing this [b]sometext[anothertext[/b]

There is code

grammar BBCodeParser
  rule document
    (open_tag | close_tag | new_line | text)*
  end
  rule open_tag
    ("[" tag_name "="? tag_data? "]")
  end

  rule close_tag
    ("[/" tag_name "]") 
  end

  rule text
    [^\n\[\]]+
  end

  rule new_line
    ("\r\n" | "\n")
  end

  rule tag_name
    # [p|br|b|i|u|hr|code|quote|list|url|img|\*|color]
    [a-zA-Z\*]+
  end

  rule tag_data
    ([^\[\]\n])+
  end
end

Problem is with rule text I dont know how to say, that text can contain everything except \r, \n, open_tag or close_tag. With this implementation it fail on example because of exclude of [ and ] (thats wrong)

So finaly question is how to do rule, that can match anything except \r, \n or exact match of open_tag or close_tag

If you have solution for another PEG implementation, give it there too. I can switch :)

Solution

I've encountered a similar problem just a while ago. There is a trick to do this:
You need to say match open_tag, followed by everything that is not a closing tag and then closing_tag. So this gives the following rule

rule tag
  open_tag ((!open_tag | !close_tag | !new_line ) .)+ close_tag
end

OTHER TIPS

This would parse any text and continue recursively when the [ wasn't the beginning of another tag.

rule text
    [^\n\[\]]+ (!open_tag text)?
end

This

rule text
    [^\n\[\]]+ (!open_tag text)?
end

ends up with Parse Error

I tried to continue with this idea and result was ([^\n] (!open_tag | !close_tag) text*) But it will fail too. It will match "sometext[anothertext[/b]"

Find temp solution ((!open_tag | !close_tag | !new_line) .) It will find just one letter by one letter, but ignore all open and close tags. These letters i can join together later :)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow