Question

I am attempting to parse a boolean expression using the Happy library. The problem is that the result is not as good as I would want it when I introduce parentheses. I have made the following grammar.

Query       : Expr              { $1 }

Expr        : Expr "OR" Term            { ActOp Or $1 $3 }
            | Expr "AND" Term           { ActOp And $1 $3 }
            | Term                      { Term $1 }


Term        : '"' string '"'            { QuotedWord $2 }
            | string                    { Word $1 }
            | '(' Expr ')'              { Brack $2}

Below is the string to be parsed and the result.

"(computer AND science) OR cs" -> ActOp Or (Term (Brack (ActOp And (Term (Word "computer")) (Word "science")))) (Word "cs")

I would prefer if it was something like the following since it would be easier to interpret:

ActOp Or (ActOp And (Word "computer") (Word "science")) (Word "cs")

Edit - Full code

{
module BoolAst where
import Data.Char
import Data.List
}

%name translate
%tokentype { Token }

%token 
      string            { TokenString $$ }
      '"'               { TokenQuote}
      "AND"             { TokenAnd }
      "OR"              { TokenOr }
      '('               { TokenOb }
      ')'               { TokenCb }

%%

Query       : Expr                      { $1 }

Expr        : Expr "OR" Term            { ActOp Or $1 $3 }
            | Expr "AND" Term           { ActOp And $1 $3 }
            | Term                      { Term $1 }


Term        : '"' string '"'            { QuotedWord $2 }
            | string                    { Word $1 }
            | '(' Expr ')'              { Brack $2}


{
happyError :: [Token] -> a
happyError _ = error ("Parse error\n")

type Query 
        = Expr

data Expr 
        = ActOp Op Expr Term
        | Term Term
        deriving Show

data Op
        = Or
        | And
        deriving Show

data Term
        = QuotedWord String
        | Word String
        | Brack Expr
        deriving Show

data Token
      = TokenQuote
      | TokenAnd
      | TokenOr
      | TokenString String
      | TokenOb
      | TokenCb
 deriving Show


lexer :: String -> [Token]
lexer [] = []
lexer cs
      | isPrefixOf "AND" cs = TokenAnd : (lexer $ drop 3 cs)
      | isPrefixOf "OR" cs = TokenOr : (lexer $ drop 2 cs)
lexer (c:cs) 
      | isSpace c = lexer cs
      | isAlpha c = lexVar (c:cs)
lexer ('"':cs) = TokenQuote : lexer cs
lexer ('(':cs) = TokenOb : lexer cs
lexer (')':cs) = TokenCb : lexer cs

lexVar cs =
   case span isAlphaNum cs of
    (var,rest)   -> TokenString var : lexer rest

main = print $ translate . lexer $ "computer AND science OR cs"
Was it helpful?

Solution

Your data types do not have to correspond directly to the grammar rules. It is perfectly valid to have the Term nonterminal produce an Expr value. For example

data Expr 
    = ActOp Op Expr Expr
    | QuotedWord String
    | Word String
    deriving Show

...

Expr        : Expr "OR" Term            { ActOp Or $1 $3 }
            | Expr "AND" Term           { ActOp And $1 $3 }
            | Term                      { $1 } -- no need to wrap in a constructor

Term        : '"' string '"'            { QuotedWord $2 }
            | string                    { Word $1 }
            | '(' Expr ')'              { $2 } -- no need to wrap in a constructor
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top