Overloading multiplication using menhir and OCaml

https://stackoverflow.com/questions/23183937

06-07-2023
|

Question

I have written a lexer and parser to analyze linear algebra statements. Each statement consists of one or more expressions followed by one or more declarations. I am using menhir and OCaml to write the lexer and parser.

For example: Ax = b, where A is invertible.

This should be read as A * x = b, (A, invertible)

In an expression all ids must be either an uppercase or lowercase symbol. I would like to overload the multiplication operator so that the user does not have to type in the '*' symbol.

However, since the lexer also needs to be able to read strings (such as "invertible" in this case), the "Ax" portion of the expression is sent over to the parser as a string. This causes a parser error since no strings should be encountered in the expression portion of the statement.

Here is the basic idea of the grammar

stmt :=
  | expr "."
  | decl "."
  | expr "," decl "."

expr :=
  | term
  | unop expr
  | expr binop expr

term :=
  | <int> num
  | <char> id
  | "(" expr ")"

decl :=
  | id "is" kinds

kinds :=
  | <string> kind
  | kind "and" kinds

Is there some way to separate the individual characters and tell the parser that they should be treated as multiplication? Is there a way to change the lexer so that it is smart enough to know that all character clusters before a comma are ids and all clusters after should be treated as strings?

Solution

It seems to me you have two problems:

You want your lexer to treat sequences of characters differently in different places.
You want multiplication to be indicated by adjacent expressions (no operator in between).

The first problem I would tackle in the lexer.

One question is why you say you need to use strings. This implies that there is a completely open-ended set of things you can say. It might be true, but if you can limit yourself to a smallish number, you can use keywords rather than strings. E.g., invertible would be a keyword.

If you really want to allow any string at all in such places, it's definitely still possible to hack a lexer so that it maintains a state describing what it has seen, and looks ahead to see what's coming. If you're not required to adhere to a pre-defined grammar, you could adjust your grammar to make this easier. (E.g., you could use commas for only one purpose.)

For the second problem, I'd say you need to add adjacency to your grammar. I.e., your grammar needs a rule that says something like term := term term. I suspect it's tricky to get this to work correctly, but it does work in OCaml (where adjacent expressions represent function application) and in awk (where adjacent expressions represent string concatenation).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow