Disclaimer: I am not expert in either Haskell or parsing. I have modified the above the code little bit
import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)
lexer = T.makeTokenParser haskellDef
whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer
lexeme = T.lexeme lexer
mainParser = do whiteSpace
ds <- many digit
eof
return ds
Let's run the above code.
Mukeshs-MacBook-Pro:Compilers mukeshtiwari$ ghci stmp.hs
GHCi, version 7.6.1: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main ( stmp.hs, interpreted )
Ok, modules loaded: Main.
*Main> parse mainParser "" "1"
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.0 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-0.11.2.3 ... linking ... done.
Loading package parsec-3.1.3 ... linking ... done.
Right "1"
*Main> parse mainParser "" "12"
Right "12"
*Main> parse mainParser "" "123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
*Main> parse mainParser "" " 123"
Right "123"
Every thing looks good so far. Now we should try some more input.
*Main> parse mainParser "" "123 "
Left (line 1, column 4):
unexpected ' '
expecting digit or end of input
Oops! Some thing went wrong with our parser. Can you spot the difference in input ? Now if you have spotted the difference, you can see that there is space at the end in second case but how come this parser is able to handle the spaces before number literals ? Remember whiteSpace function, it eats all the spaces before the number literals and give the remaining input to rest of code ( many digit ) which keep consuming as many number literals as it can before encountering something which is not digit. Again the rest of input ( in our case remaining spaces ) is passed to eof so our parser complains about space. Can we ignore these spaces while reading the number literals ? We know that whiteSpace eats zero or more spaces so add it our code( Ignore <* for a moment ).
import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)
import Control.Applicative ( (<*) )
lexer = T.makeTokenParser haskellDef
whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer
lexeme = T.lexeme lexer
mainParser = do whiteSpace
ds <- many ( digit <* whiteSpace )
eof
return ds
and after running this code
*Main> parse mainParser "" " 31312 "
Right "31312"
*Main> parse mainParser "" " 3131 2 "
Right "31312"
*Main> parse mainParser "" " 313 1 2 "
Right "31312"
*Main> parse mainParser "" " 3 1 3 1 2 "
Right "31312"
*Main> parse mainParser "" " 31 3 1 2 "
Right "31312"
Now it looks fine. Let's try to see how this code is able to handle the space. All initial spaces are taken by whiteSpace and remaining input is passed to next function ( many ( digit <* whiteSpace ) ). Here digit consumes a number literal and whiteSpace consumes zero or more space and result of this computation is result of digit. Looking at the documentation of lexeme, lexeme p first applies parser p and than the whiteSpace parser so lexeme digit will first consume a digit and then zero or more space.