Вопрос

I am having trouble figuring this out.

So if a string is followed by one or many newline chars without one or many spaces after it - it is an end of line and I return the line. If a string is followed by one or many newline chars and then one or many spaces after that - it is a line continuation and I keep going till I encounter newlines without spaces. Then return it.

This just totally locked my brain. Please help.

UPDATE

In case there is a confusion about my explanation above, I give an example

From: John Doe <j.doe@gmail.com>
To: dude@cooldomain.biz
Content-Type: multipart/alternative;
  boundary=047d7b2e4e3cdc627304eb094bfe

Given the above text I should be able to parse 3 lines for further processing like so

["From: John Doe <j.doe@gmail.com>", "To: dude@cooldomain.biz", "Content-Type: multipart/alternative; boundary=047d7b2e4e3cdc627304eb094bfe"]
Это было полезно?

Решение

Something like this pseudocode, perhaps (assuming you want to keep all the whitespace):

continuedLine = go "" where
    go s = do
        s'      <- many (noneOf "\n")
        empties <- many (char '\n')
        let soFar = s ++ s' ++ empties
        (char ' ' >> go (soFar ++ " ")) <|> return soFar

Apply your favorite transformation to eliminate the deeply-nested left-associated ++s.

EDIT: Hm, it just occurred to me that there's a subtlety I may have overlooked. In case this is not a continuation, are you hoping to leave the newlines "unparsed", so to speak? If so, you can use try to do something like this:

continuedLine = go "" where
    continuationHerald = do
        empties <- many (char '\n')
        char ' '
        return (empties ++ " ")

    go s = do
        s'   <- many (noneOf "\n")
        cont <- try (Just <$> continuationHerald) <|> return Nothing
        case cont of
            Nothing -> return (s ++ s')
            Just empties -> go (s ++ s' ++ empties)

Note that we go to some length to avoid putting the recursive call to go inside the try. This is an efficiency concern: doing so would cause the parser to refuse to give up on the alternate return Nothing branch, and prevent garbage collection of the beginning of the string being parsed.

Другие советы

I suggest splitting your parser into multiple passes, so the code for parsing expressions isn't cluttered with whitespace handling. Example:

  • lex :: String -> [Token]

    Handle whitespace and split input into tokens.

  • parse :: Parsec [Token] Expr

    Convert a stream of tokens to an expression tree.

Here's a pretty straightforward way to join continued lines:

-- | For each line with whitespace in front of it,
-- remove it and append it to the preceding line.
joinContinuedLines :: [String] -> [String]
joinContinuedLines [] = []
joinContinuedLines (x0:xs0) = go x0 xs0
  where
    go joinedLine (x : xs)
      | startsWithSpace x = go (joinedLine ++ x) xs
      | otherwise         = joinedLine : go x xs
    go joinedLine [] = [joinedLine]

    startsWithSpace (x:_) = isSpace x
    startsWithSpace ""    = False
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top