Question

I'm trying to implement a parser for an old forth-based grammar where most of the functions take the form of: "num" "num" "command" where command is a string of some kind.

For example:

    0 1 HSFF
    41 SENSOR ON
    1 12.0 BH 4 LNON

As you can see, the grammar is [mostly] reverse polish notation, with some string of arguments preceding the command. The grammar is pseudo white-space dependent, in that:

    0 1 HSFF 41 SENSOR ON

Is as valid as:

    0 1 HSFF
    41 SENSOR ON

(In other words '\n' is treated just as a space)

Extra whitespace is also skipped, so:

    0           1 HSFF              41 SENSOR      ON

Is 2 valid commands with a lot of unnecessary whitespace.

All of this seemed simple enough, so I started chugging away at implementing the grammar. Of course, things are never as simple as they seem, and I found that my parser fails on the very first character (in this case an int). So, boiling things down, I tried implementing a single rule:

    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    qi::rule<Iterator> Cmd_TARGETSENSPAIRCMD = 
        qi::int_ >> (lit("TARGET") | lit("SENSOR") | lit("PAIR") ) 
        >> (lit("ON") | lit("OFF") | lit("ERASE") );

    std::string in("0 TARGET ERASE\n");

    Iterator = in.begin();
    bool success = qi::parse(in.begin(), in.end(), Cmd_TARGETSENSPAIRCMD, ascii::space);

This code block always returns false, indicating that parsing has failed.

As you can see, the rule is that an int must be followed by two literals, in this case indicating whether the command is for a target, sensor, or pair, identified by the int, to be turned on, off, or erased.

If I look at the iterator to see where the parsing has stopped, it shows that it has failed immediately on the int. So I changed the rule to simply be +qi::int_, which succeeds in parsing the int, but fails on the literals. Shortening the rule to simply qi::int_ >> lit("TARGET") also fails.

I think the problem may be in the whitespace skipper I'm using, but I have been unable to determine what I'm doing wrong.

Is there a way to tell spirit that all tokens are separated by whitespace, with the exception of quoted strings (which turn into labels in my grammar)?

Was it helpful?

Solution

I have phantasized a little for you.

The first step I usually take is to come up with an AST model:

namespace Ast
{
    enum Command  { NO_CMD, TARGET, SENSOR, PAIR };
    enum Modifier { NO_MODIFIER, ON, OFF, ERASE };

    struct ModifiedCommand
    {
        Command cmd  = NO_CMD;
        Modifier mod = NO_MODIFIER;
    };

    struct OtherCommand
    {
        std::string token;
        OtherCommand(std::string token = "") : token(std::move(token))
        { }
    };

    typedef boost::variant<int, double> Operand;

    typedef boost::variant<Operand, ModifiedCommand, OtherCommand> RpnMachineInstruction;
    typedef std::vector<RpnMachineInstruction> RpnMachineProgram;
}

As you can see I intend to distinguish integers and double for operand values, and I treat any "other" commands (like "HSSF") that wasn't actively described in your grammar as free-form tokens (uppercase alphabetical).

Now, we map the rule definitions onto this:

RpnGrammar() : RpnGrammar::base_type(_start)
{
    _start         = *_instruction;
    _instruction   = _operand | _mod_command | _other_command;

    _operand       = _strict_double | qi::int_;

    _mod_command   = _command >> _modifier;
    _other_command = qi::as_string [ +qi::char_("A-Z") ];

    // helpers
    _command.add("TARGET", Ast::TARGET)("SENSOR", Ast::SENSOR)("PAIR", Ast::PAIR);
    _modifier.add("ON", Ast::ON)("OFF", Ast::OFF)("ERASE", Ast::ERASE);
}

The grammar parses the result into a list of instructions (Ast::RpnMachineProgram), where each instruction is either an operand or an operation (a command with modifier, or any other free-form command like "HSSF"). Here are the rule declarations:

qi::rule<It, Ast::RpnMachineProgram(),     Skipper> _start;
qi::rule<It, Ast::RpnMachineInstruction(), Skipper> _instruction;
qi::rule<It, Ast::ModifiedCommand(),       Skipper> _mod_command;
qi::rule<It, Ast::Operand(),               Skipper> _operand;

// note: omitting the Skipper has the same effect as wrapping with `qi::lexeme`
qi::rule<It, Ast::OtherCommand()> _other_command;

qi::real_parser<double, boost::spirit::qi::strict_real_policies<double> > _strict_double;
qi::symbols<char, Ast::Command>  _command;
qi::symbols<char, Ast::Modifier> _modifier;

You can see it parse the sample from the question:

Parse succeeded, 10 stack instructions
int:0 int:1 'HSFF'
int:41 SENSOR [ON] 
int:1 double:12 'BH'
int:4 'LNON'

The output is created with a sample visitor that you could use as inspiration for an interpreter/executor.

See it Live On Coliru

Full Listing

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>

namespace qi = boost::spirit::qi;

namespace Ast
{
    enum Command  { NO_CMD, TARGET, SENSOR, PAIR };
    enum Modifier { NO_MODIFIER, ON, OFF, ERASE };

    struct ModifiedCommand
    {
        Command cmd  = NO_CMD;
        Modifier mod = NO_MODIFIER;
    };

    struct OtherCommand
    {
        std::string token;
        OtherCommand(std::string token = "") : token(std::move(token))
        { }
    };

    typedef boost::variant<int, double> Operand;

    typedef boost::variant<Operand, ModifiedCommand, OtherCommand> RpnMachineInstruction;
    typedef std::vector<RpnMachineInstruction> RpnMachineProgram;

    // for printing, you can adapt this to execute the stack instead
    struct Print : boost::static_visitor<std::ostream&>
    {
        Print(std::ostream& os) : os(os) {}
        std::ostream& os;

        std::ostream& operator()(Ast::Command cmd) const {
            switch(cmd) {
                case TARGET: return os << "TARGET" << " ";
                case SENSOR: return os << "SENSOR" << " ";
                case PAIR:   return os << "PAIR"   << " ";
                case NO_CMD: return os << "NO_CMD" << " ";
                default:     return os << "#INVALID_COMMAND#" << " ";
            }
        }

        std::ostream& operator()(Ast::Modifier mod) const {
            switch(mod) {
                case ON:          return os << "[ON]"          << " ";
                case OFF:         return os << "[OFF]"         << " ";
                case ERASE:       return os << "[ERASE]"       << " ";
                case NO_MODIFIER: return os << "[NO_MODIFIER]" << " ";
                default:    return os << "#INVALID_MODIFIER#" << " ";
            }
        }

        std::ostream& operator()(double d) const { return os << "double:" << d << " "; }
        std::ostream& operator()(int    i) const { return os << "int:"    << i << " "; }

        std::ostream& operator()(Ast::OtherCommand const& cmd) const {
            return os << "'" << cmd.token << "'\n";
        }

        std::ostream& operator()(Ast::ModifiedCommand const& cmd) const {
            (*this)(cmd.cmd);
            (*this)(cmd.mod);
            return os << "\n"; 
        }

        template <typename... TVariant>
        std::ostream& operator()(boost::variant<TVariant...> const& v) const { 
            return boost::apply_visitor(*this, v); 
        }
    };

}

BOOST_FUSION_ADAPT_STRUCT(Ast::ModifiedCommand, (Ast::Command, cmd)(Ast::Modifier, mod))

template <typename It, typename Skipper = qi::space_type>
struct RpnGrammar : qi::grammar<It, Ast::RpnMachineProgram(), Skipper>
{
    RpnGrammar() : RpnGrammar::base_type(_start)
    {
        _command.add("TARGET", Ast::TARGET)("SENSOR", Ast::SENSOR)("PAIR", Ast::PAIR);
        _modifier.add("ON", Ast::ON)("OFF", Ast::OFF)("ERASE", Ast::ERASE);

        _start         = *_instruction;
        _instruction   = _operand | _mod_command | _other_command;

        _operand       = _strict_double | qi::int_;

        _mod_command   = _command >> _modifier;
        _other_command = qi::as_string [ +qi::char_("A-Z") ];
    }

  private:
    qi::rule<It, Ast::RpnMachineProgram(),     Skipper> _start;
    qi::rule<It, Ast::RpnMachineInstruction(), Skipper> _instruction;
    qi::rule<It, Ast::ModifiedCommand(),       Skipper> _mod_command;
    qi::rule<It, Ast::Operand(),               Skipper> _operand;

    // note: omitting the Skipper has the same effect as wrapping with `qi::lexeme`
    qi::rule<It, Ast::OtherCommand()> _other_command;

    qi::real_parser<double, boost::spirit::qi::strict_real_policies<double> > _strict_double;
    qi::symbols<char, Ast::Command>  _command;
    qi::symbols<char, Ast::Modifier> _modifier;
};

int main()
{
    std::ifstream ifs("input.txt");
    typedef boost::spirit::istream_iterator It;
    ifs.unsetf(std::ios::skipws);

    RpnGrammar<It> grammar;

    It f(ifs), l;
    Ast::RpnMachineProgram program;
    bool ok = qi::phrase_parse(f, l, grammar, qi::space, program);

    if (ok)
    {
        std::cout << "Parse succeeded, " << program.size() << " stack instructions\n";
        std::for_each(
                program.begin(),
                program.end(),
                Ast::Print(std::cout));
    }
    else
    {
        std::cout << "Parse failed\n";
    }

    if (f != l)
    {
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top