Using lexeme in parsec

Question 1

Disclaimer: I haven't used Parsec yet. That being said, lexeme is a field of GenTokenParser s u m. If you inspect it's type in GHCi, you'll end up with

lexeme :: GenTokenParser s u m -> ParsecT s u m a -> ParsecT s u m a

Therefore, you already need a generic token parser, which you can create with makeTokenParser. The latter has the type:

makeTokenParser
  :: Stream s m Char =>
     Text.Parsec.Token.GenLanguageDef s u m
     -> Text.Parsec.Token.GenTokenParser s u m

It takes a language definition and returns a token parser. Since you don't have any specific language in mind, you can use emptyDef from Text.Parsec.Language. Note that whiteSpace also takes a GenTokenParser. And last, in this setup you will end up with ds :: [Char], therefore you need to use digitToInt from Data.Char before you can actually sum your digits:

import Text.Parsec
import Text.Parsec.Token (lexeme, makeTokenParser, whiteSpace)
import Text.Parsec.Language (emptyDef)
import Data.Char (digitToInt)

lexer = makeTokenParser emptyDef

mainParser  = do{ whiteSpace lexer
     ; ds <- many (lexeme lexer digit)
     ; eof
     ; return (sum . map digitToInt $ ds)
     }

main = do
  putStrLn "Please give some digits (whitespaces are ignored)"
  line <- getLine
  case parse mainParser "" line of
    Right n -> putStrLn $ "Sum of digits is " ++ show n
    Left  _ -> putStrLn $ "Couldn't parse your line"

Example output:

*Main> :main
Please give some digits
7 8 91 72 3945 01 92
Sum of digits is 67

*Main> :main
Please give some digits
abc 1
Couldn't parse your line

Question 2

Disclaimer: I am not expert in either Haskell or parsing. I have modified the above the code little bit

import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)

lexer = T.makeTokenParser haskellDef

whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer


lexeme = T.lexeme lexer

mainParser  = do whiteSpace
                 ds <- many digit
                 eof
                 return ds

Let's run the above code.

Mukeshs-MacBook-Pro:Compilers mukeshtiwari$ ghci stmp.hs 
GHCi, version 7.6.1: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
[1 of 1] Compiling Main             ( stmp.hs, interpreted )
Ok, modules loaded: Main.
*Main> parse mainParser "" "1" 
Loading package array-0.4.0.1 ... linking ... done.
Loading package deepseq-1.3.0.1 ... linking ... done.
Loading package bytestring-0.10.0.0 ... linking ... done.
Loading package transformers-0.3.0.0 ... linking ... done.
Loading package mtl-2.1.2 ... linking ... done.
Loading package text-0.11.2.3 ... linking ... done.
Loading package parsec-3.1.3 ... linking ... done.
Right "1"
*Main> parse mainParser "" "12"
Right "12"
*Main> parse mainParser "" "123"
Right "123"
*Main> parse mainParser "" "      123"
Right "123"
*Main> parse mainParser "" "  123"
Right "123"
*Main> parse mainParser "" "     123"
Right "123"

Every thing looks good so far. Now we should try some more input.

*Main> parse mainParser "" "123 "
Left (line 1, column 4):
unexpected ' '
expecting digit or end of input

Oops! Some thing went wrong with our parser. Can you spot the difference in input ? Now if you have spotted the difference, you can see that there is space at the end in second case but how come this parser is able to handle the spaces before number literals ? Remember whiteSpace function, it eats all the spaces before the number literals and give the remaining input to rest of code ( many digit ) which keep consuming as many number literals as it can before encountering something which is not digit. Again the rest of input ( in our case remaining spaces ) is passed to eof so our parser complains about space. Can we ignore these spaces while reading the number literals ? We know that whiteSpace eats zero or more spaces so add it our code( Ignore <* for a moment ).

import Text.Parsec
import qualified Text.Parsec.Token as T
import Text.Parsec.String ( Parser )
import Text.Parsec.Language (haskellDef)
import Control.Applicative ( (<*) )
lexer = T.makeTokenParser haskellDef

whiteSpace :: Parser ()
whiteSpace = T.whiteSpace lexer

lexeme = T.lexeme lexer

mainParser  = do whiteSpace
                 ds <- many  ( digit <* whiteSpace )
                 eof
                 return  ds

and after running this code

*Main> parse mainParser ""   "    31312 "
Right "31312"
*Main> parse mainParser ""   "    3131 2 "
Right "31312"
*Main> parse mainParser ""   "    313 1 2 "
Right "31312"
*Main> parse mainParser ""   "    3 1 3 1 2 "
Right "31312"
*Main> parse mainParser ""   "    31 3 1 2 "
Right "31312"

Now it looks fine. Let's try to see how this code is able to handle the space. All initial spaces are taken by whiteSpace and remaining input is passed to next function ( many ( digit <* whiteSpace ) ). Here digit consumes a number literal and whiteSpace consumes zero or more space and result of this computation is result of digit. Looking at the documentation of lexeme, lexeme p first applies parser p and than the whiteSpace parser so lexeme digit will first consume a digit and then zero or more space.