Making BBcode parser with PEG problem
Question
I am making bbcode parser with PEG (Citrus implementation for Ruby) and I am stuck on parsing this [b]sometext[anothertext[/b]
There is code
grammar BBCodeParser
rule document
(open_tag | close_tag | new_line | text)*
end
rule open_tag
("[" tag_name "="? tag_data? "]")
end
rule close_tag
("[/" tag_name "]")
end
rule text
[^\n\[\]]+
end
rule new_line
("\r\n" | "\n")
end
rule tag_name
# [p|br|b|i|u|hr|code|quote|list|url|img|\*|color]
[a-zA-Z\*]+
end
rule tag_data
([^\[\]\n])+
end
end
Problem is with rule text
I dont know how to say, that text can contain everything except \r, \n, open_tag or close_tag.
With this implementation it fail on example because of exclude of [ and ] (thats wrong)
So finaly question is how to do rule, that can match anything except \r, \n or exact match of open_tag or close_tag
If you have solution for another PEG implementation, give it there too. I can switch :)
Solution
I've encountered a similar problem just a while ago. There is a trick to do this:
You need to say match open_tag
, followed by everything that is not a closing tag and then closing_tag
. So this gives the following rule
rule tag
open_tag ((!open_tag | !close_tag | !new_line ) .)+ close_tag
end
OTHER TIPS
This would parse any text and continue recursively when the [
wasn't the beginning of another tag.
rule text
[^\n\[\]]+ (!open_tag text)?
end
This
rule text
[^\n\[\]]+ (!open_tag text)?
end
ends up with Parse Error
I tried to continue with this idea and result was ([^\n] (!open_tag | !close_tag) text*)
But it will fail too. It will match "sometext[anothertext[/b]"
Find temp solution
((!open_tag | !close_tag | !new_line) .)
It will find just one letter by one letter, but ignore all open and close tags. These letters i can join together later :)