Question

I'm using the parsing library Parsec to parse some text. I simply need to parse lines, which are strings of arbitrary characters, ending with a '\n' or an eof when its at the end of the string. When calling parseHS' I get the complaint that Exception: Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string..

parseHS' :: String -> Either ParseError [String]
parseHS' input = parse hsFile' "(unknown)" input

hsFile' :: GenParser Char st [String]
hsFile' = do
    many1 line

line :: GenParser Char st String
line = do
    result <- many (noneOf "\n")
    optional newline
    return result

How could this be achieved correctly?

Était-ce utile?

La solution

Of course, if you only need to split the input by lines, you could use lines.

sepEndBy in Parsec does what you want - splits input into a list of parsed entities separated by a given separator, optionally ending with it or eof.

Your grammar for line permits the parser to produce a never-ending stream of lines for any input. This can be resolved by making the decision about newline externally to line:

hsFile' = do
        x <- line
        xs <- many $ do
                newline
                line
        eof
        return (x:xs)

line = many $ noneOf "\n"

This one will produce a empty line at the end in case the file ends with newline.

Autres conseils

If you apply many (or many1) to a parser that accepts (among others) the empty string, you have an ambiguous grammar. The empty string could be recognized arbitrarily often, resulting in different parse trees.

In this case, line accepts the empty string, and many1 is implemented in terms of many, so this triggers the exception. The solution in your situation is probably to make sure that line always consumes at least one character.

It's overkill to use a parser for this, since you've no restriction on the content of the lines. There's a library function, lines, that achieves what you're asking with considerably less fuss.

Example:

lines "Hello there\neveryone,\nhere are some lines,"
> ["Hello there", "everyone,", "here are some lines,"]

If the lines have some structure, you should code that first instead of trying to chop the string up - bottom up is the best way to write recursive descent parsers.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top