OCaml + Menhir Compiling/Writing
Domanda
I'm a complete newbie when it comes to OCaml. I've only recently started using the language (about 2 weeks ago), but unfortunately, I've been tasked with making a syntax analyzer (parser + lexer, whose function is to either accept or not a sentence) for a made up language using Menhir. Now, I've found some materials on the internet regarding OCaml and Menhir:
The Menhir Manual.
This webpage for some French University course.
A short Menhir tutorial on Toss's homepage at Sourceforge.
A Menhir example on github by derdon.
A book on OCaml (with a few things about ocamllex+ocamlyacc
A random ocamllex tutorial by SooHyoung Oh.
And the examples that come with Menhir's source code.
(I can't put more than two hyperlinks, so I can't link you directly to some of the websites I'm mentioning here. Sorry!)
So, as you can see, I've been desperately searching for more and more material to aid me in the making of this program. Unfortunately, I still cannot grasp many concepts, and as such, I'm having many, many difficulties.
For starters, I have no idea how to correctly compile my program. I've been using the following command:
ocamlbuild -use-menhir -menhir "menhir --external-tokens Tokens" main.native
My program is divided in four different files: main.ml; lexer.mll; parser.mly; tokens.mly. main.ml is the part that gets input from a file in the file system given as an argument.
let filename = Sys.argv.(1)
let () =
let inBuffer = open_in filename in
let lineBuffer = Lexing.from_channel inBuffer in
try
let acceptance = Parser.main Lexer.main lineBuffer in
match acceptance with
| true -> print_string "Accepted!\n"
| false -> print_string "Not accepted!\n"
with
| Lexer.Error msg -> Printf.fprintf stderr "%s%!\n" msg
| Parser.Error -> Printf.fprintf stderr "At offset %d: syntax error.\n%!" (Lexing.lexeme_start lineBuffer)
The second file is lexer.mll.
{
open Tokens
exception Error of string
}
rule main = parse
| [' ' '\t']+
{ main lexbuf }
| ['0'-'9']+ as integer
{ INT (int_of_string integer) }
| "True"
{ BOOL true }
| "False"
{ BOOL false }
| '+'
{ PLUS }
| '-'
{ MINUS }
| '*'
{ TIMES }
| '/'
{ DIVIDE }
| "def"
{ DEF }
| "int"
{ INTTYPE }
| ['A'-'Z' 'a'-'z' '_']['0'-'9' 'A'-'Z' 'a'-'z' '_']* as s
{ ID (s) }
| '('
{ LPAREN }
| ')'
{ RPAREN }
| '>'
{ LARGER }
| '<'
{ SMALLER }
| ">="
{ EQLARGER }
| "<="
{ EQSMALLER }
| "="
{ EQUAL }
| "!="
{ NOTEQUAL }
| '~'
{ NOT }
| "&&"
{ AND }
| "||"
{ OR }
| '('
{ LPAREN }
| ')'
{ RPAREN }
| "writeint"
{ WRITEINT }
| '\n'
{ EOL }
| eof
{ EOF }
| _
{ raise (Error (Printf.sprintf "At offset %d: unexpected character.\n" (Lexing.lexeme_start lexbuf))) }
The third file is parser.mly.
%start <bool> main
%%
main:
| WRITEINT INT { true }
The fourth one is tokens.mly
%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR
%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%{
type token =
| ID of (string)
| INT
| BOOL
| DEF
| INTTYPE
| LPAREN
| RPAREN
| WRITEINT
| PLUS
| MINUS
| TIMES
| DIVIDE
| LARGER
| SMALLER
| EQLARGER
| EQSMALLER
| EQUAL
| NOTEQUAL
| NOT
| AND
| OR
| EOF
| EOL
%}
%%
Now, I know there is a lot of unused symbols here, but I intend to use them in my parser. No matter how many changes I make to the files, the compiler keeps blowing up on my face. I have tried everything I can think of, and nothing seems to work. What is it that is making ocamlbuild explode in a plethora of errors of unbound constructors and non-defined start symbols? What command should I be using to compile the program properly? Where can I find meaningful materials to learn about Menhir?
Soluzione
A simpler way to do this is to remove the Parser
/Tokens
separation. As Thomas noted, there is no need for a declaration type token = ...
, because it is automatically produced by menhir from the %token
directives.
So you can define parser.mly
as:
%start <bool> main
%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR
%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%%
main:
| WRITEINT INT { true }
and lexer.mll
as:
{
open Parser
exception Error of string
}
[...] (* rest of the code not shown here *)
then remove tokens.mly
, and compile with
ocamlbuild -use-menhir main.native
and it all works well.
Altri suggerimenti
So first, you don't need to repet the tokens in tokens.mly
:
%token <string> ID
%token <int> INT
%token <bool> BOOL
%token EOF EOL DEF INTTYPE LPAREN RPAREN WRITEINT
%token PLUS MINUS TIMES DIVIDE
%token LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%token NOT AND OR
%left OR
%left AND
%nonassoc NOT
%nonassoc LARGER SMALLER EQLARGER EQSMALLER EQUAL NOTEQUAL
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc LPAREN
%nonassoc ATTRIB
%%
Then, I don't know the magic option to pass to ocamlbuild
and I don't know menhir
very well, but, in my understanding you need to "pack" all the .mly
into one parser unit:
menhir tokens.mly parser.mly -base parser
Then, if you replace any occurrence of Token
byt Parser
in lexer.mll
, ocamlbuild -no-hygiene main.byte
should work. Note however that there is maybe a clever way to do it.
I ran into the same problem, except that in addition the parser needed modules outside of the current direct. I couldn't figure out how to invoke ocamlbuild to specify that parser.{ml,mli} had to be built from 3 mly files, so I simply made a makefile that:
- copies the modules .cmi from _build into the current directory (to satisfy menhir --infer)
- invoke menhir
- remove the copied modules to satisfy ocamlbuild
- then invoke ocamlbuild
I am not satisfied with it, so I am interested in any better alternative, but if you really have to finish your project with minimal effort, I guess that's the way to go
edit: Actually, there is no need to copy and remove the compiled modules, just pass the option to menhir at the second step: menhir --ocamlc "ocamlc -I \"../_build/modules/\"" --infer --base parser
Sadly, this stills means that the parser generation will be wrt the previous compilation of the modules, hence an unnecessary (and failed) first compilation is to be expected.