Question

I'm currently writing a compiler front end for personal education on the topic and I've run into a problem concerning the way I handle BNF definition in C++ through operator overloading.

Currently my setup is as follows:

Rule.h:

class Rule
{
public:
    ChainRule operator>>(Rule& right);
    OrRule operator|(Rule& right);
    KleeneRule operator*();
    OptionalRule Rule::operator+();

    virtual bool parse(TokenList::iterator& begin, TokenList::iterator end) = 0;
};

Rule.cpp:

ChainRule Rule::operator>>(Rule& right) {
    return ChainRule(this, &right);
}

OrRule Rule::operator|(Rule& right) {
    return OrRule(this, &right);
}

KleeneRule Rule::operator*() {
    return KleeneRule(this);
}

OptionalRule Rule::operator+() {
    return OptionalRule(this);
}

ChainRule, OrRule, KleeneRule, OptionalRule and EmptyRule are defined trivially like so:

class ChainRule : public Rule
{
private:
    Rule* next;
    Rule* _this;

public:
    ChainRule();
    ChainRule(Rule* _this, Rule* right);

    bool parse(TokenList::iterator& begin, TokenList::iterator end) override;
};

Each subclass of Rule obviously defines a resonable implementation of parse(). Using these classes I can define my grammar as follows:

OrRule assignment_exp   = logical_or_exp
                        | unary_exp >> StringRule("=") >> assignment_exp
                        ;   

Now here's the problem: Each overloaded operator returns a new object by value. This means that whenever I use operator>> or operator| from the Rule class, these pointers will be garbage once I return from the call to operator>> or operator| since the stack has been cleaned up and the objects are gone.

Neither can I use pass by value in the constructors for my Rule subclasses since this would not allow me to define recursive grammars.

So I'm left with no option of passing objects by value, and no option of passing objects by pointers. Can anyone point me to a solution that would not force me to define my grammar like so?

StringRule s = StringRule("=");
OrRule assignment_exp;
ChainRule temp1 = s >> assignment_exp;
ChainRule temp2 = unary_exp >> temp1;
assignment_exp = logical_or_exp | temp2;

P.S. I am aware of the various parser generators and Boost.Spirit, but my goal is to write my own parser.

Was it helpful?

Solution

You can allocate the return objects on the heap (via a factory) and return them as references. The factory can keep track of them so you won't leak. As far as syntax is concerned it will work the same as when you return them by value.

OTHER TIPS

You can get around this problem by replacing your Rule* (which have the problem that you cannot overload operators for them) with wrapper objects. I.e. ChainRule would contain RuleRef next instead of Rule * next, etc., and all the operators would be defined for RuleRef. RuleRef would simply contain a Rule*, and be constructable from a Rule*. To make memory handling easier, you could inherit from a smart pointer class.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top