Question

I am trying to write something to parse my Django template, however my parser fails if anything follows an {% endblock %}

Here is what I have so far

import Control.Monad
import Text.ParserCombinators.Parsec


data Piece = StaticPiece String 
           | BlockPiece String [Piece]
           | VarPiece String
  deriving (Show)

noWhitespace = many1 $ oneOf "_" <|> alphaNum

parseBlock = do
  blockName <- between (string "{% block" >> spaces) (spaces >> string "%}") noWhitespace <?> "block tag"
  blockContent <- many (parsePiece (void $ try $ (string "{% endblock %}")))
  return $ BlockPiece blockName blockContent

parseVar = do
  var <- between (string "{{" >> spaces) (spaces >> string "}}") noWhitespace <?> "variable"
  return $ VarPiece var

parseStatic end = do
  s <- manyTill (anyChar) $ end <|> (void $ lookAhead $ try $ parseNonStatic)
  return $ StaticPiece s 

parseNonStatic = try parseBlock <|> parseVar
parsePiece s = try parseNonStatic <|> (parseStatic s)

parsePieces = manyTill (parsePiece eof) eof

main :: IO ()
main = do
  putStrLn "1"
  print $ parse parsePieces "" "Blah blah blah"
  putStrLn "2"
  print $ parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
  putStrLn "3"
  print $ parse parsePieces "" "{% block body %}{% endblock %}"
  putStrLn "4"
  print $ parse parsePieces "" "{% block body %}{{ hello }}{% endblock %}"
  putStrLn "5"
  print $ parse parsePieces "" "{% block body %}{% {% endblock %}"
  putStrLn "6"
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "
  putStrLn "7"
  print $ parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
  putStrLn "8"
  print $ parse parsePieces "" "{% block body %} {} {{ cool }} {% block inner_body %} Hello: {{ hello }}{% endblock %}{% endblock %} ldsakjf"
  print ">>"
  --
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "

I am thinking that somehow instead of looking at the string from beginning to end, it is looking at it from the end somehow. If you look at #7 StaticPiece " " is inside the innermost block when it should be in the body block. Any help would be appreciated.

Edit the above code outputs:

1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" [StaticPiece ""]]
4
Right [BlockPiece "body" [VarPiece "hello",StaticPiece ""]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello",StaticPiece "",StaticPiece " "]]]
8
Right [StaticPiece "{% block body %} {} ",VarPiece "cool",StaticPiece " {% block inner_body %} Hello: ",VarPiece "hello",StaticPiece "{% endblock %}{% endblock %} ldsakjf"]
">>"
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
Était-ce utile?

La solution

Let's rewrite some of the parsers to make things run smoothly.

Use manyTill to parse blocks with matching endblock tags

Firstly, we'll need to use parsers that match {% something or other %}, so let's make that a function:

tag p = between (string "{%" >> spaces) (spaces >> string "%}") p <?> "tag"
ghci> parse (tag $ string "any parser here") "" "{% any parser here %}"
Right "any parser here"

Let's use manyTill in parseBlock, to grab the endblock tag. I'm still using try, because tag (string "endblock") can fail having read some input, eg { at the start of a variable or other non-tag.

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock") 
  return $ BlockPiece blockName blockContent

parseStatic mustn't match nothing, and should pause to check for tags/vars

parseStatic is the source of most of the problems with this parser - it allows anything except a tag or var, which is always problematic - parsers are much better at following rules than being liberal.

We need to stop parseStatic from just eating the remainder of the input, so that the nonstatic parsers get a chance to try again, so let's make a parser to peek at the next character without using it up in any way. Using a single character like this avoids lots of backtracking, although we'll see later there's some combining to do.

peekChar = void . try . lookAhead .char 

parseStatic also mustn't match the empty string - parsers that match the empty string aren't allowed to be used with any many combinator, because they would allow infinite parses like [StaticPiece "",StaticPiece "",StaticPiece ""..]. That's why we'll allow any character we like (including {) then as many characters as we like that aren't {. The only thing other than { that can terminate a StaticPiece is the end of the input, which is why eof is allowed here.

parseStatic = do
  c <- anyChar
  s <- manyTill anyChar (peekChar '{' <|> eof)
  return $ StaticPiece (c:s) 
ghci> parse parseStatic "" "some stuff not containing { other stuff"
Right (StaticPiece "some stuff not containing ")

So we get

parsePieces = manyTill parsePiece eof

Glue those statics together

We now get nice parses like

ghci> parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]

but also uglier ones like

ghci> parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
Right [BlockPiece "body" [StaticPiece " ",StaticPiece "{} ",StaticPiece "{ ",StaticPiece "{",StaticPiece "{ }",StaticPiece "{ ",StaticPiece "{",StaticPiece "{",StaticPiece "{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]]

because parseStatic stops every time we hit {. Let's roll adjacent statics into one with a few helper functions:

isStatic :: Piece -> Bool
isStatic (StaticPiece _) = True
isStatic _ = False

unStatic :: Piece -> String
unStatic (StaticPiece s) = s
unStatic _ = error "unStatic: applied to something other than a StaticPiece"

We'll use span :: (a -> Bool) -> [a] -> ([a], [a]) to collect up the non-statics and concat the statics:

combineStatics :: [Piece] -> [Piece] 
combineStatics pieces = let (nonstatics,therest) = span (not.isStatic) pieces in
    nonstatics ++ combine therest where
      combine [] = []
      combine ps = let (statics,more) = span isStatic ps in
        (StaticPiece . concat . map unStatic) statics : combineStatics more

and rewrite parseBlock to combine any statics in its block content:

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock")
  return $ BlockPiece blockName (combineStatics blockContent)

Now it works well

The tests now run as I imagine you'd hope:

1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" []]
4
Right [BlockPiece "body" [VarPiece "hello"]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Right (BlockPiece "body" [])
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]]
8
Right [BlockPiece "body" [StaticPiece " {} ",VarPiece "cool",StaticPiece " ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"]],StaticPiece " ldsakjf"]
">>"
Right (BlockPiece "body" [])

Autres conseils

I think I figured it out.

I changed the code so that the parseBlock is the one consuming {% endblock %} and not parseStatic.

parseBlockContent end = 
  manyTill (parsePiece (lookAhead $ try $ end)) (try $ end)

parseBlock = do
  blockName <- parseTemplateTag (string "block") wordString <?> "block tag"
  blockContent <- parseBlockContent (void $ string "{% endblock %}")
  return $ BlockPiece blockName blockContent

It would be nice to have it so it doesn't need to backtrack so much though, especially since parseStatic has to consume a whole {% block %} {% endblock %} to tell if it should continue.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top