Question

I'm writing a lexer in haskell. Here's the code:

lexer :: String -> [Token]
lexer s
    | s =~ whitespace :: Bool =
        let token = s =~ whitespace :: String in
            lex (drop (length token) s)
    | s =~ number :: Bool =
        let token = s =~ number :: String in
            Val (read token) : lex (drop (length token) s)
    | s =~ operator :: Bool =
        let token = s =~ operator :: String in
            Oper token : lex (drop (length token) s)
    | otherwise = error "unrecognized character"
    where
        whitespace = "^[ \t\n]"
        number = "^[0-9]*(\.[0-9]+)?"
        operator = "^[+-*/()]"

data Token = Val Int | Oper String

There are two problems I'm having. One, the number regex "^[0-9]*(\.[0-9]+)?" throws this error:

lexical error in string/character literal at character '['

And when I comment out the line containing it and the part of the function that uses it, I get this error:

Couldn't match expected type `Token'
           against inferred type `(String, String)'
      Expected type: [Token]
      Inferred type: [(String, String)]
    In the expression: lex (drop (length token) s)
    In the expression:
        let token = s =~ whitespace :: String
        in lex (drop (length token) s)

I have no idea why I'm getting either of these errors. Can someone help me?

Was it helpful?

Solution

Backslashes are used as escape characters in string literals, like in "\n" for a string containing a newline. If you want a literal backslash you need to escape it as "\\". That's the problem in the regex "^[0-9]*(\.[0-9]+)?", the Haskell parser tries to interpret "\." as a normal string escape and chokes on it (probably because there is no such escape). If you write the regex as "^[0-9]*(\\.[0-9]+)?" the error goes away.

The reason for the type problem is that you call lex from the standard Prelude in lex (drop (length token) s), which has type String -> [(String, String)]. Probably you wanted to do a recursive call to your own function lexer instead...

OTHER TIPS

Also, note that "^[0-9]*(\\.[0-9]+)?" matches an empty string or numbers like .12 (instead of 0.12), which you probably don't want. It is a serious problem, because it would cause your function to call itself infinitely. To fix that, change * to +.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top