Question

I have faced with an unclear behavior of parsec parsers, so I want to parsre strings as same

>  <CdId>
1

>  <Mol Weight>
270.2369

>  <Formula>
C15H10O5

>  <LOG_ER_RBA>
-0.36

>  <ACTIVITY>
1

I wrote a parser

 parseProperties = do       
        skipMany1 newline
        char '>' >> spaces >> char '<' 
        propName <- many1 (noneOf ">")
        char '>'
        newline
        propValue <- many1 (noneOf "\n")
        return (propName,propValue)

This parser excellently parse one item, and also is able to parse several:

parseTest (count 5 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

results

 [("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"),("ACTIVITY","1")]

Nevertheless I found no ways to parse random numbers of properties. If I try

parseTest (many1 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

or

parseTest (manyTill parseProperties (try eof)) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

parser failed

parse error at (line 17, column 1):
unexpected end of input
expecting new-line or ">"

But, if I use anyChar parser, it is not failed.

parseTest (manyTill anyChar (try eof)) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

"\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"
Était-ce utile?

La solution

The parseProperties parser is executed many times in your example until eof is encountered. The problem is that parseProperties does not consume the trailing whitespace in your example, so after parsing the last tag, the remaining string is "\n\n", which will not trigger your termination condition, since it is not the end of input. This causes the parseProperties to be attempted again, which consumes the whitespace but fails when trying to eat a '>'.

Try modifying your parseTest to the following

test = "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

parseTest (manyTill parseProperties $ try (skipMany newline >> eof)) test

This trys stripping the preceding whitespace before checking if it is at an end of input.

Autres conseils

In case the amount of "\n" is random, I would use this version (rather than adding an additional parser):

parseProperties :: Parser (String,String)
parseProperties = do
  skipMany newline -- optional newline(s)
  char '>' >> spaces >> char '<'
  propName <- many1 (noneOf ">")
  char '>'
  newline
  propValue <- many1 (noneOf "\n")
  skipMany newline  -- optional newline(s)
  return (propName,propValue)

I tried this version:

 parseTest (many1 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n><Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n" 

And got:

[("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"),     ("ACTIVITY","1")]
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top