Question

I am trying to write something to parse my Django template, however my parser fails if anything follows an {% endblock %}

Here is what I have so far

import Control.Monad
import Text.ParserCombinators.Parsec


data Piece = StaticPiece String 
           | BlockPiece String [Piece]
           | VarPiece String
  deriving (Show)

noWhitespace = many1 $ oneOf "_" <|> alphaNum

parseBlock = do
  blockName <- between (string "{% block" >> spaces) (spaces >> string "%}") noWhitespace <?> "block tag"
  blockContent <- many (parsePiece (void $ try $ (string "{% endblock %}")))
  return $ BlockPiece blockName blockContent

parseVar = do
  var <- between (string "{{" >> spaces) (spaces >> string "}}") noWhitespace <?> "variable"
  return $ VarPiece var

parseStatic end = do
  s <- manyTill (anyChar) $ end <|> (void $ lookAhead $ try $ parseNonStatic)
  return $ StaticPiece s 

parseNonStatic = try parseBlock <|> parseVar
parsePiece s = try parseNonStatic <|> (parseStatic s)

parsePieces = manyTill (parsePiece eof) eof

main :: IO ()
main = do
  putStrLn "1"
  print $ parse parsePieces "" "Blah blah blah"
  putStrLn "2"
  print $ parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
  putStrLn "3"
  print $ parse parsePieces "" "{% block body %}{% endblock %}"
  putStrLn "4"
  print $ parse parsePieces "" "{% block body %}{{ hello }}{% endblock %}"
  putStrLn "5"
  print $ parse parsePieces "" "{% block body %}{% {% endblock %}"
  putStrLn "6"
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "
  putStrLn "7"
  print $ parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
  putStrLn "8"
  print $ parse parsePieces "" "{% block body %} {} {{ cool }} {% block inner_body %} Hello: {{ hello }}{% endblock %}{% endblock %} ldsakjf"
  print ">>"
  --
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "

I am thinking that somehow instead of looking at the string from beginning to end, it is looking at it from the end somehow. If you look at #7 StaticPiece " " is inside the innermost block when it should be in the body block. Any help would be appreciated.

Edit the above code outputs:

1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" [StaticPiece ""]]
4
Right [BlockPiece "body" [VarPiece "hello",StaticPiece ""]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello",StaticPiece "",StaticPiece " "]]]
8
Right [StaticPiece "{% block body %} {} ",VarPiece "cool",StaticPiece " {% block inner_body %} Hello: ",VarPiece "hello",StaticPiece "{% endblock %}{% endblock %} ldsakjf"]
">>"
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
Was it helpful?

Solution

Let's rewrite some of the parsers to make things run smoothly.

Use manyTill to parse blocks with matching endblock tags

Firstly, we'll need to use parsers that match {% something or other %}, so let's make that a function:

tag p = between (string "{%" >> spaces) (spaces >> string "%}") p <?> "tag"
ghci> parse (tag $ string "any parser here") "" "{% any parser here %}"
Right "any parser here"

Let's use manyTill in parseBlock, to grab the endblock tag. I'm still using try, because tag (string "endblock") can fail having read some input, eg { at the start of a variable or other non-tag.

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock") 
  return $ BlockPiece blockName blockContent

parseStatic mustn't match nothing, and should pause to check for tags/vars

parseStatic is the source of most of the problems with this parser - it allows anything except a tag or var, which is always problematic - parsers are much better at following rules than being liberal.

We need to stop parseStatic from just eating the remainder of the input, so that the nonstatic parsers get a chance to try again, so let's make a parser to peek at the next character without using it up in any way. Using a single character like this avoids lots of backtracking, although we'll see later there's some combining to do.

peekChar = void . try . lookAhead .char 

parseStatic also mustn't match the empty string - parsers that match the empty string aren't allowed to be used with any many combinator, because they would allow infinite parses like [StaticPiece "",StaticPiece "",StaticPiece ""..]. That's why we'll allow any character we like (including {) then as many characters as we like that aren't {. The only thing other than { that can terminate a StaticPiece is the end of the input, which is why eof is allowed here.

parseStatic = do
  c <- anyChar
  s <- manyTill anyChar (peekChar '{' <|> eof)
  return $ StaticPiece (c:s) 
ghci> parse parseStatic "" "some stuff not containing { other stuff"
Right (StaticPiece "some stuff not containing ")

So we get

parsePieces = manyTill parsePiece eof

Glue those statics together

We now get nice parses like

ghci> parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]

but also uglier ones like

ghci> parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
Right [BlockPiece "body" [StaticPiece " ",StaticPiece "{} ",StaticPiece "{ ",StaticPiece "{",StaticPiece "{ }",StaticPiece "{ ",StaticPiece "{",StaticPiece "{",StaticPiece "{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]]

because parseStatic stops every time we hit {. Let's roll adjacent statics into one with a few helper functions:

isStatic :: Piece -> Bool
isStatic (StaticPiece _) = True
isStatic _ = False

unStatic :: Piece -> String
unStatic (StaticPiece s) = s
unStatic _ = error "unStatic: applied to something other than a StaticPiece"

We'll use span :: (a -> Bool) -> [a] -> ([a], [a]) to collect up the non-statics and concat the statics:

combineStatics :: [Piece] -> [Piece] 
combineStatics pieces = let (nonstatics,therest) = span (not.isStatic) pieces in
    nonstatics ++ combine therest where
      combine [] = []
      combine ps = let (statics,more) = span isStatic ps in
        (StaticPiece . concat . map unStatic) statics : combineStatics more

and rewrite parseBlock to combine any statics in its block content:

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock")
  return $ BlockPiece blockName (combineStatics blockContent)

Now it works well

The tests now run as I imagine you'd hope:

1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" []]
4
Right [BlockPiece "body" [VarPiece "hello"]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Right (BlockPiece "body" [])
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]]
8
Right [BlockPiece "body" [StaticPiece " {} ",VarPiece "cool",StaticPiece " ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"]],StaticPiece " ldsakjf"]
">>"
Right (BlockPiece "body" [])

OTHER TIPS

I think I figured it out.

I changed the code so that the parseBlock is the one consuming {% endblock %} and not parseStatic.

parseBlockContent end = 
  manyTill (parsePiece (lookAhead $ try $ end)) (try $ end)

parseBlock = do
  blockName <- parseTemplateTag (string "block") wordString <?> "block tag"
  blockContent <- parseBlockContent (void $ string "{% endblock %}")
  return $ BlockPiece blockName blockContent

It would be nice to have it so it doesn't need to backtrack so much though, especially since parseStatic has to consume a whole {% block %} {% endblock %} to tell if it should continue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top