Here's a more direct version that doesn't use a custom lexer. It does use try
though, and I don't see how to avoid it here. The problem is that it seems we need a non-committing look ahead to distinguish double brackets from single brackets; try
is for non-committing look ahead.
The high level approach is that same as in
my first answer. I've been careful
to make the three node parsers commute -- making the code more robust
to change -- by using both try
and notFollowedBy
:
{-# LANGUAGE TupleSections #-}
import Text.Parsec hiding (string)
import qualified Text.Parsec
import Control.Applicative ((<$>) , (<*) , (<*>))
import Control.Monad (forM_)
import Data.List (find)
import Debug.Trace
----------------------------------------------------------------------
-- Token parsers.
llink , rlink , lbrace , rbrace :: Parsec String u String
[llink , rlink , lbrace , rbrace] = reserved
reserved = map (try . Text.Parsec.string) ["[[" , "]]" , "{{" , "}}"]
----------------------------------------------------------------------
-- Node parsers.
-- Link, braces, or string.
data Node = L [Node] | B [Node] | S String deriving Show
nodes :: Parsec String u [Node]
nodes = many node
node :: Parsec String u Node
node = link <|> braces <|> string
link , braces , string :: Parsec String u Node
link = L <$> between llink rlink nodes
braces = B <$> between lbrace rbrace nodes
string = S <$> many1 (notFollowedBy (choice reserved) >> anyChar)
----------------------------------------------------------------------
parseNodes :: String -> Either ParseError [Node]
parseNodes = parse (nodes <* eof) "<no file>"
----------------------------------------------------------------------
-- Tests.
getLink :: [Node] -> Maybe Node
getLink = find isLink where
isLink (L _) = True
isLink _ = False
parseLink :: String -> Either ParseError (Maybe Node)
parseLink = either Left (Right . getLink) . parseNodes
testList = [ " [[rightLink]] "
, " {{ [[Wrong_Link]] }} [[rightLink]]"
, " {{ {{ }} [[Wrong_Link]] }} [[rightLink]]"
, " [[{{[[someLink]]}}]] {{}} {{[[asdf]]}}"
-- Pathalogical example from comments.
, "{{ab}cd}}"
-- A more pathalogical example.
, "{ [ { {asf{[[[asdfa]]]}aasdff ] ] ] {{[[asdf]]}}asdf"
-- No top level link.
, "{{[[Wrong_Link]]asdf[[WRong_Link]]{{}}}}{{[[[[Wrong]]]]}}"
-- Too many '{{'.
, "{{ {{ {{ [[ asdf ]] }} }}"
-- Too many '}}'.
, "{{ {{ [[ asdf ]] }} }} }}"
-- Too many '[['.
, "[[ {{ [[{{[[asdf]]}}]]}}"
]
main =
forM_ testList $ \ t -> do
putStrLn $ "Test: ^" ++ t ++ "$"
let parses = ( , ) <$> parseNodes t <*> parseLink t
printParses (n , l) = do
putStrLn $ "Nodes: " ++ show n
putStrLn $ "Link: " ++ show l
printError = putStrLn . show
either printError printParses parses
putStrLn ""
The output is the same in the non-error cases:
Test: ^ [[rightLink]] $
Nodes: [S " ",L [S "rightLink"],S " "]
Link: Just (L [S "rightLink"])
Test: ^ {{ [[Wrong_Link]] }} [[rightLink]]$
Nodes: [S " ",B [S " ",L [S "Wrong_Link"],S " "],S " ",L [S "rightLink"]]
Link: Just (L [S "rightLink"])
Test: ^ {{ {{ }} [[Wrong_Link]] }} [[rightLink]]$
Nodes: [S " ",B [S " ",B [S " "],S " ",L [S "Wrong_Link"],S " "],S " ",L [S "rightLink"]]
Link: Just (L [S "rightLink"])
Test: ^ [[{{[[someLink]]}}]] {{}} {{[[asdf]]}}$
Nodes: [S " ",L [B [L [S "someLink"]]],S " ",B [],S " ",B [L [S "asdf"]]]
Link: Just (L [B [L [S "someLink"]]])
Test: ^{{ab}cd}}$
Nodes: [B [S "ab}cd"]]
Link: Nothing
Test: ^{ [ { {asf{[[[asdfa]]]}aasdff ] ] ] {{[[asdf]]}}asdf$
Nodes: [S "{ [ { {asf{",L [S "[asdfa"],S "]}aasdff ] ] ] ",B [L [S "asdf"]],S "asdf"]
Link: Just (L [S "[asdfa"])
Test: ^{{[[Wrong_Link]]asdf[[WRong_Link]]{{}}}}{{[[[[Wrong]]]]}}$
Nodes: [B [L [S "Wrong_Link"],S "asdf",L [S "WRong_Link"],B []],B [L [L [S "Wrong"]]]]
Link: Nothing
but the parse error messages are not as informative in the cases of unmatched openings:
Test: ^{{ {{ {{ [[ asdf ]] }} }}$
"<no file>" (line 1, column 26):
unexpected end of input
expecting "[[", "{{", "]]" or "}}"
Test: ^{{ {{ [[ asdf ]] }} }} }}$
"<no file>" (line 1, column 26):
unexpected "}}"
Test: ^[[ {{ [[{{[[asdf]]}}]]}}$
"<no file>" (line 1, column 25):
unexpected end of input
expecting "[[", "{{", "]]" or "}}"
and I couldn't figure out how to fix them.