Question

Background

I'm currently designing my own programming language as a research project. I have most of the grammar done and written down as context-free grammar, and it should be working as is. - Now I'm working on the actual compiler that should translate the language into x86 binary assembly code, more specifically, I am working on the parser (the front end).

The language's syntax is, for the most part, very similar to Java/C/C++. The parser, which constructs an intermediate representation out of source code, works as follows:

The grammar is built as a big tree in which the actual source code only determines the leaves; Each syntactic variable (or nonterminal) has it's own class, and each of these classes has a static get(byte[] source, int offset) method which returns a new leaf (node), or null if the source code syntax does not fit this nonterminal structure.

I am using a variation of predictive parsing, by the way.

For the nonterminal DataType, I have chosen the following grammatical structure:

DataType:
    PrimitiveDataType
 |  ArrayDataType
 |  ComplexDataType

ArrayDataType:
    DataType []

Did I mention this language is object-oriented?

So the problem here is that when DataType's get method is called, it first checks whether the following is a primitive data type, PrimitiveDataType's get method is called. Assuming we have an array, this would return null, so it continues on to check whether it's an ArrayDataType, by calling it's get method.

Problem

Arrays may be created of any data type, including arrays themselves (which would look like Type[][]). So what ArrayDataType's get method would do is again call DataType's get method to figure out what type the array is of. Unfortunately, this is where my parser implementation would fail because this behavior results in a loop!

Question

Would there be any good/better design alternatives to this?

No correct solution

Licensed under: CC-BY-SA with attribution
scroll top