I am having trouble figuring this out.

So if a string is followed by one or many newline chars without one or many spaces after it - it is an end of line and I return the line. If a string is followed by one or many newline chars and then one or many spaces after that - it is a line continuation and I keep going till I encounter newlines without spaces. Then return it.

This just totally locked my brain. Please help.

UPDATE

In case there is a confusion about my explanation above, I give an example

From: John Doe <j.doe@gmail.com>
To: dude@cooldomain.biz
Content-Type: multipart/alternative;
  boundary=047d7b2e4e3cdc627304eb094bfe

Given the above text I should be able to parse 3 lines for further processing like so

["From: John Doe <j.doe@gmail.com>", "To: dude@cooldomain.biz", "Content-Type: multipart/alternative; boundary=047d7b2e4e3cdc627304eb094bfe"]
有帮助吗?

解决方案

Something like this pseudocode, perhaps (assuming you want to keep all the whitespace):

continuedLine = go "" where
    go s = do
        s'      <- many (noneOf "\n")
        empties <- many (char '\n')
        let soFar = s ++ s' ++ empties
        (char ' ' >> go (soFar ++ " ")) <|> return soFar

Apply your favorite transformation to eliminate the deeply-nested left-associated ++s.

EDIT: Hm, it just occurred to me that there's a subtlety I may have overlooked. In case this is not a continuation, are you hoping to leave the newlines "unparsed", so to speak? If so, you can use try to do something like this:

continuedLine = go "" where
    continuationHerald = do
        empties <- many (char '\n')
        char ' '
        return (empties ++ " ")

    go s = do
        s'   <- many (noneOf "\n")
        cont <- try (Just <$> continuationHerald) <|> return Nothing
        case cont of
            Nothing -> return (s ++ s')
            Just empties -> go (s ++ s' ++ empties)

Note that we go to some length to avoid putting the recursive call to go inside the try. This is an efficiency concern: doing so would cause the parser to refuse to give up on the alternate return Nothing branch, and prevent garbage collection of the beginning of the string being parsed.

其他提示

I suggest splitting your parser into multiple passes, so the code for parsing expressions isn't cluttered with whitespace handling. Example:

  • lex :: String -> [Token]

    Handle whitespace and split input into tokens.

  • parse :: Parsec [Token] Expr

    Convert a stream of tokens to an expression tree.

Here's a pretty straightforward way to join continued lines:

-- | For each line with whitespace in front of it,
-- remove it and append it to the preceding line.
joinContinuedLines :: [String] -> [String]
joinContinuedLines [] = []
joinContinuedLines (x0:xs0) = go x0 xs0
  where
    go joinedLine (x : xs)
      | startsWithSpace x = go (joinedLine ++ x) xs
      | otherwise         = joinedLine : go x xs
    go joinedLine [] = [joinedLine]

    startsWithSpace (x:_) = isSpace x
    startsWithSpace ""    = False
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top