Question

I'm writing a JavaScript parser with Happy and I need to match a regular expression. I don't want to fully parse the regex, just store it as a string.

The relevant part of my AST looks like this:

data PrimaryExpr
    -- | Literal integer
    = ExpLitInt     Integer
    -- | Literal strings
    | ExpLitStr     String
    -- | Identifier
    | ExpId         String
    -- | Bracketed expression
    | ExpBrackExp   Expression
    -- | This (current object)
    | ExpThis
    -- | Regular Expression
    | ExpRegex      String
    -- | Arrays
    | ExpArray      ArrayLit
    -- | Objects
    | ExpObject     [(PropName, Assignment)]
    deriving Show

This is the relevant Happy code:

primaryExpr :: { PrimaryExpr }
    : LITINT          { ExpLitInt $1 }
    | LITSTR          { ExpLitStr $1 }
    | ID              { ExpId $1 }
    | THIS            { ExpThis }
    | regex           { ExpRegex $1 }
    | arrayLit        { ExpArray $1 }
    | objectLit       { ExpObject $1 }
    | '(' expression ')' { ExpBrackExp $2 }

My question is, how should I define my regex non-terminal? Is this kind of structure right?

regex :: { String }
    : '/' whatHere? '/' { $2 }
Was it helpful?

Solution

You should define regex as a terminal that is recognized by the lexer (i.e. LITREGEX).

primaryExpr :: { PrimaryExpr }
    : LITINT          { ExpLitInt $1 }
    | LITSTR          { ExpLitStr $1 }
    | LITREGEX        { ExpRegex $1 }
    | ID              { ExpId $1 }
    | THIS            { ExpThis }
    | arrayLit        { ExpArray $1 }
    | objectLit       { ExpObject $1 }
    | '(' expression ')' { ExpBrackExp $2 }

OTHER TIPS

To answer the question in the comment, need a bit more room.

Something like (spaced out and commented):

/             forward slash
(  \\.        either: an escaped character
|  [^\[/\\]           anything which isn't / or [ or \
|  \[                 a character class containing:
     [^\]]*              anything which isn't ] any number of times
   \]                   
)*            any number of times
/             forward slash

Condensed:

/(\\.|[^\[/\\]|\[[^\]]*\])*/
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top