Question

I'm trying to parse following text using boost-spirit but no success as of now, please see comments below

Output_A :=
    LE600ms.{
    (LE1s.{FE1s.{Signal1}} AND
    LE1s.{FE1s.{Signal2}})
    OR
    (LE3ms.{FE2ms.{NOT.Signal3}} AND
    LE3ms.{FE2ms.{Signal4}})};
  • LE[xy]s = Leading edge for x ysecs , where y = m,n (milli, nano)
  • FE[xy]s = Falling edge for x ysecs , where y = m,n (milli, nano)
  • Output_A, Signal1/2/3/4 - Identifier

I tried following rules (For declarations please refer the attached source)

start   =  (ident_ >> ":=" >  expr_)  
           [ _val = phx::construct<binop<op_equ> >(_1, _2) ] 
          ;
expr_   = tok_.alias();
tok_    = ( not_   >> binop_ >> tok_ ) 
          [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ] 
          | not_   [ _val = _1 ] ;

not_    = "NOT."  >> simple [ _val = phx::construct<unop <op_not> >(_1) ]        
          | simple [ _val = _1 ] ;


// Not sure how to use something like following rule for 
// final expression
// Here I need the edge type LE or FE and duration, 
// so something like binary operator
tdelay_ = lexeme[ ( lit("LE") | lit("FE") ) 
                 > double_ > *char_("mn") > "s." ] ; // ???

simple   =   ('('  > expr_ > ')') 
           | ('['  > expr_ > ']') 
           | ('{'  > expr_ > '}') 
           | ident_ 
           | lexeme[double_] ;

ident_   =  lexeme[ char_("a-zA-Z_") >> *char_("a-zA-Z_0-9")];

Also,

Is it possible to use a rule in lexeme, for instance how can I use ident_ to check its property say

Output_A.status := Signal_4;

so a rule like :

lexeme[ident_ >> '.' "status" ] > ":=" ident_ ; 

is possible ?

  • My source.

  • Using Boost-1.55.0

Was it helpful?

Solution

To the various "other problems":

  1. I think you had missed the balancing of the brackets in the not rule:

    not_    =  ("NOT."  >> simple [ _val = phx::construct<unop <op_not> >(_1) ]
               | simple [ _val = _1 ]  )
               | (tdelay_  >> simple [ _val = phx::construct<unop <op_not> >(_1) ]
               | simple [ _val = _1 ]  )
               ;
    

    Boils down to

    not_    = ("NOT."  >> simple | simple)
            | (tdelay_ >> simple | simple)
            ;
    

    Which seems to be a confused way to write

    not_    = "NOT."   >> simple
            | simple
            | tdelay_  >> simple
            ;
    

    Now you might spot the problem: simple could be and ident_ and ident_ matches stuff like tdelay_ (up till the closing .). So what we have is an _ident which parses successfully (e.g. LE600ms) but then... an unexpected .. Which is precisely what gets reported:

    Expectation Failure at '.{ ....'
    

    So, there's two ways to fix things:

    • either make ident_ ignore expressions that could be interpreted as a tdelay_:

       simple   = ('(' > expr_ > ')')
                | ('[' > expr_ > ']')
                | ('{' > expr_ > '}')
                | lexeme[double_]
                | (!tdelay_ >> ident_) // HERE
                ;
      

      This could lead to considerable backtracking though, so

    • alternatively you could 'fix' the preferred order of parser branches in not_:`

       not_    = ("NOT."  >> simple [ _val = phx::construct<unop <op_not> >(_1) ])
               | (tdelay_ >> simple [ _val = phx::construct<unop <op_not> >(_1) ])
               | simple [ _val = _1 ]
               ;
      

      which is what I'd suggest

  2. You're "integration" of prop_ident_ is ... clumsy. Instead of

    start   =  (ident_ >> ":=" >  expr_)  [ _val = phx::construct<binop<op_equ> >(_1, _2) ]
              | prop_ident_ // [ _val = _2 ]
              ;
    // ~20 lines of grammar skipped
    prop_ident_ =  lexeme[ char_("a-zA-Z_") >> *char_("a-zA-Z_0-9")
                                           >>  prop_
                                           > ":="
                                           >  char_("a-zA-Z_") >> *char_("a-zA-Z_0-9")
                             ]
                             //[ _val = _2 ]
                             ;
    

    You should consider expression your grammar as you'd describe it:

    program_    = *statement_;
    statement_  = (signal_def_ | prop_assgn_) >> ';';
    
    signal_def_ = ident_ >> ":=" >>  expr_;
    prop_assgn_ = ident_ >> lexeme ['.' >> raw [ prop_ ]] >> ":=" >>  ident_;
    

    Whoa!? Where've all the semantic actions gone? Well, I've invented some simplestest AST types that mirror the actual parsed structures:

    qi::rule<It, signal_definition(),  Skipper> signal_def_;
    qi::rule<It, property_assignment(),Skipper> prop_assgn_;
    qi::rule<It, statement(),          Skipper> statement_;
    qi::rule<It, program(),            Skipper> program_;
    

    And the actual types? Woop woop:

    struct signal_definition   { std::string name; expr value; };
    struct property_assignment { std::string signal, property, value_ident; };
    
    BOOST_FUSION_ADAPT_STRUCT(signal_definition, (std::string, name)(expr, value))
    BOOST_FUSION_ADAPT_STRUCT(property_assignment, (std::string, signal)(std::string, property)(std::string, value_ident))
    
    typedef boost::variant<signal_definition, property_assignment> statement;
    typedef std::vector<statement> program;
    

    See the result of this transformation Live On Coliru

Bonus: completing the AST

Now if we also "fix" the delay expressions to integrate with our AST, we can

  • have a more readable grammar

    program_    = *statement_;
    statement_  = (signal_def_ | prop_assgn_) >> ';';
    
    signal_def_ = ident_ >> ":=" >>  expr_;
    prop_assgn_ = ident_ >> lexeme ['.' >> raw [ prop_ ]] >> ":=" >>  ident_;
    
    expr_       = ( not_ >> binop_ >> expr_ )  [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ] 
                  | not_ [ _val = _1 ];
    not_        = neg_expr_
                | delay_expr_
                | simple
                ;
    
    neg_expr_   = "NOT."  >> simple [ _val = phx::construct<unop <op_not> >(_1) ];
    
    delay_expr_ = tdelay_ >> expr_;
    
    tdelay_     = raw[edge_] > double_ > raw[unit_];
    
    simple      =   ('(' > expr_ > ')') 
                  | ('[' > expr_ > ']') 
                  | ('{' > expr_ > '}') 
                  | lexeme[double_]
                  | ident_
                  ;
    
    ident_      = char_("a-zA-Z_") >> *char_("a-zA-Z_0-9");
    
  • print our AST without information loss: e.g input:

    Test_Q := LE600ms.Signal12;
    Test_A := Signal1;
    Test_Z := (Signal1);
    Test_B := (Signal1 OR Signal12) AND Signal3;
    Test_A.expire := Signal2;
    
    Output_B :=
        LE600ms.{
        (LE1s.{FE1s.{Signal1}} AND
        LE1s.{FE1s.{Signal2}})
        OR
        (LE3ms.{FE2ms.{NOT.Signal3}} AND
        LE3ms.{FE2ms.{Signal4}})};
    

    begets output:

    Test_Q := (with LE delay of 600ms. Signal12);
    Test_A := Signal1;
    Test_Z := Signal1;
    Test_B := ((Signal1 | Signal12) & Signal3);
    Test_A.expire := Signal2;
    Output_B := (with LE delay of 600ms. ((with LE delay of 1s. ((with FE delay of 1s. Signal1) & (with LE delay of 1s. (with FE delay of 1s. Signal2)))) | (with LE delay of 3ms. ((with FE delay of 2ms. (!Signal3)) & (with LE delay of 3ms. (with FE delay of 2ms. Signal4))))));
    

Note that now that the grammar properly deals with the... grammar (such as the notion of multiple statements), the main driver is also simpler:

static const Skip skip = qi::space | "--" >> *(qi::char_ - qi::eol) >> qi::eol;
static const parser<It, Skip> p;

try
{
    program result;
    bool ok = qi::phrase_parse(f, l, p, skip, result);

    if (!ok) std::cerr << "invalid input\n";
    else     std::cout << result << "\n";
    if (f!=l)
        std::cerr << "remaining unparsed input: '" << std::string(f,l) << "'\n";
}
catch (const qi::expectation_failure<It>& e)
{
    std::cerr << "Expectation Failure at '" << std::string(e.first, e.last) << "'" << std::endl;
}

In fact, the whole code listing is shorter now, even though it does quite a few things more. See it Live On Coliru as well!

Full Code Listing

The full code listing is on Coliru (above) and also in this post (for future availability)

#define BOOST_SPIRIT_USE_PHOENIX_V3

#include <fstream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <boost/variant/recursive_wrapper.hpp>

namespace qi    = boost::spirit::qi;
namespace phx   = boost::phoenix;

typedef std::string var;
template <typename tag> struct binop;
template <typename tag> struct unop;
struct delayed_expr;

typedef boost::variant<var,double, 
        // Logical Operators
        boost::recursive_wrapper<binop<struct op_equ> >, 
        boost::recursive_wrapper<unop <struct op_not> >, 
        boost::recursive_wrapper<binop<struct op_and> >, 
        boost::recursive_wrapper<binop<struct op_xor> >, 
        boost::recursive_wrapper<binop<struct op_or>  >, 

        // /*Airthemetic Operators*/
        boost::recursive_wrapper<binop<struct op_plus>  >, 
        boost::recursive_wrapper<binop<struct op_minus> >, 
        boost::recursive_wrapper<binop<struct op_mul>   >, 
        boost::recursive_wrapper<binop<struct op_div>   >, 
        boost::recursive_wrapper<binop<struct op_mod>   >, 

        // /*Relational Operators*/
        boost::recursive_wrapper<binop<struct op_gt>  >, 
        boost::recursive_wrapper<binop<struct op_lt>  >, 
        boost::recursive_wrapper<binop<struct op_gte> >, 
        boost::recursive_wrapper<binop<struct op_lte> >, 
        boost::recursive_wrapper<binop<struct op_eq> >,
        boost::recursive_wrapper<binop<struct op_ne> >,

        // tentative stuff
        boost::recursive_wrapper<delayed_expr>
    > expr;

template <typename tag> struct binop 
{ 
    //explicit binop(const expr& l, const std::string& c, const expr& r) : oper1(l), oper2(r), op(c) { }
     explicit binop(const expr& l,  const expr& r) : oper1(l), oper2(r) { }
    expr oper1, oper2; 
    //std::string op;
};

template <typename tag> struct unop  
{ 
    explicit unop(const expr& o) : oper1(o) { }
    expr oper1; 
};

struct signal_definition   {
    std::string name; expr value; 
    friend std::ostream& operator<<(std::ostream& os, signal_definition const& sd) {
        return os << sd.name << " := " << sd.value;
    }
};
struct property_assignment {
    std::string signal, property, value_ident; 
    friend std::ostream& operator<<(std::ostream& os, property_assignment const& pa) {
        return os << pa.signal << '.' << pa.property << " := " << pa.value_ident;
    }
};

struct tdelay {
    std::string edge, unit;
    double amount;
    friend std::ostream& operator<<(std::ostream& os, tdelay const& td) {
        return os << "with " << td.edge << " delay of " << td.amount << td.unit << " ";
    }
};

struct delayed_expr {
    tdelay delay;
    expr e;
};

BOOST_FUSION_ADAPT_STRUCT(signal_definition,   (std::string, name)(expr, value))
BOOST_FUSION_ADAPT_STRUCT(property_assignment, (std::string, signal)(std::string, property)(std::string, value_ident))
BOOST_FUSION_ADAPT_STRUCT(tdelay,              (std::string, edge)(double, amount)(std::string, unit))
BOOST_FUSION_ADAPT_STRUCT(delayed_expr,        (tdelay, delay)(expr, e))

typedef boost::variant<signal_definition, property_assignment> statement;
typedef std::vector<statement> program;

std::ostream& operator<<(std::ostream& os, const expr& e);

struct printer : boost::static_visitor<void>
{
    printer(std::ostream& os) : _os(os) {}
    std::ostream& _os;

    void operator()(const var& v)             const { _os << v;  }
    void operator()(const double& val)        const { _os << val; }

    void operator()(const binop<op_and>& b)   const { print(" & ",  b.oper1, b.oper2); }
    void operator()(const binop<op_or >& b)   const { print(" | ",  b.oper1, b.oper2); }
    void operator()(const binop<op_xor>& b)   const { print(" ^ ",  b.oper1, b.oper2); }
    void operator()(const binop<op_equ>& b)   const { print(" = ",  b.oper1, b.oper2); }

    void operator()(const binop<op_plus>& b)  const { print(" + ",  b.oper1, b.oper2); }
    void operator()(const binop<op_minus>& b) const { print(" - ",  b.oper1, b.oper2); }
    void operator()(const binop<op_mul>& b)   const { print(" * ",  b.oper1, b.oper2); }
    void operator()(const binop<op_div>& b)   const { print(" / ",  b.oper1, b.oper2); }
    void operator()(const binop<op_mod>& b)   const { print(" % ",  b.oper1, b.oper2); }

    void operator()(const binop<op_gt>& b)    const { print(" > ",  b.oper1, b.oper2); }
    void operator()(const binop<op_lt>& b)    const { print(" < ",  b.oper1, b.oper2); }
    void operator()(const binop<op_gte>& b)   const { print(" >= ", b.oper1, b.oper2); }
    void operator()(const binop<op_lte>& b)   const { print(" <= ", b.oper1, b.oper2); }
    void operator()(const binop<op_eq>& b)    const { print(" == ", b.oper1, b.oper2); }
    void operator()(const binop<op_ne>& b)    const { print(" != ", b.oper1, b.oper2); }

    void print(const std::string& op, const expr& l, const expr& r) const
    {
        _os << "(";
            boost::apply_visitor(*this, l);
            _os << op;
            boost::apply_visitor(*this, r);
        _os << ")";
    }

    void operator()(const delayed_expr& u) const
    {
        _os << '(' << u.delay << u. e << ')';
    }

    void operator()(const unop<op_not>& u) const
    {
        _os << "(!";
            boost::apply_visitor(*this, u.oper1);
        _os << ")";
    }
};

std::ostream& operator<<(std::ostream& os, const expr& e)
{ 
    boost::apply_visitor(printer(os), e); 
    return os; 
}

std::ostream& operator<<(std::ostream& os, const program& p)
{ 
    for (auto& stmt : p) os << stmt << ";\n";
    return os; 
}

template <typename It, typename Skipper = qi::space_type>
    struct parser : qi::grammar<It, program(), Skipper>
{
    enum op_token { 
        TOK_PLUS, TOK_MINUS, TOK_DIV, TOK_MULT, TOK_MOD,
        TOK_LT, TOK_LTE, TOK_GT, TOK_GTE,
        TOK_EQ, TOK_NE,TOK_AND,TOK_OR,TOK_XOR
    };

    static expr make_binop(op_token discriminant, const expr& left, const expr& right)
    {
        switch(discriminant)
        {
            case TOK_PLUS:  return binop<op_plus>(left , right); // "+" ,
            case TOK_MINUS: return binop<op_minus>(left, right); // "-" ,
            case TOK_DIV:   return binop<op_div>(left  , right); // "/" ,
            case TOK_MULT:  return binop<op_mul>(left  , right); // "*" ,
            case TOK_MOD:   return binop<op_mod>(left  , right); // "%" ,
            case TOK_LT:    return binop<op_lt>(left   , right); // "<" ,
            case TOK_LTE:   return binop<op_lte>(left  , right); // "<=",
            case TOK_GT:    return binop<op_gt>(left   , right); // ">" ,
            case TOK_GTE:   return binop<op_gte>(left  , right); // ">" ,
            case TOK_EQ:    return binop<op_eq>(left   , right); // ">=",
            case TOK_NE:    return binop<op_ne>(left   , right); // "!" ,
            case TOK_AND:   return binop<op_and>(left  , right);
            case TOK_OR:    return binop<op_or>(left   , right);
            case TOK_XOR:   return binop<op_xor>(left  , right);
        }
        throw std::runtime_error("unreachable in make_binop");
    }

    parser() : parser::base_type(program_)
    {
        using namespace qi;

        program_    = *statement_;
        statement_  = (signal_def_ | prop_assgn_) >> ';';

        signal_def_ = ident_ >> ":=" >>  expr_;
        prop_assgn_ = ident_ >> lexeme ['.' >> raw [ prop_ ]] >> ":=" >>  ident_;

        expr_       = ( not_ >> binop_ >> expr_ )  [ _val = phx::bind(make_binop, qi::_2, qi::_1, qi::_3) ] 
                      | not_ [ _val = _1 ];
        not_        = neg_expr_
                    | delay_expr_
                    | simple
                    ;

        neg_expr_   = "NOT."  >> simple [ _val = phx::construct<unop <op_not> >(_1) ];

        delay_expr_ = tdelay_ >> expr_;

        tdelay_     = raw[edge_] > double_ > raw[unit_];

        simple      =   ('(' > expr_ > ')') 
                      | ('[' > expr_ > ']') 
                      | ('{' > expr_ > '}') 
                      | lexeme[double_]
                      | ident_
                      ;

        ident_      = char_("a-zA-Z_") >> *char_("a-zA-Z_0-9");

        BOOST_SPIRIT_DEBUG_NODES(
                (program_) (signal_def_) (prop_assgn_)
                (expr_) (not_) (neg_expr_) (delay_expr_)
                (simple) (ident_) (tdelay_)
             )

        binop_.add
            ("-",  TOK_MINUS)
            ("+",  TOK_PLUS)
            ("/",  TOK_DIV)
            ("*",  TOK_MULT)
            ("%",  TOK_MOD)
            ("<",  TOK_LT)
            ("<=", TOK_LTE)
            (">",  TOK_GT)
            (">=", TOK_GTE)
            ("==", TOK_EQ)
            ("!=", TOK_NE)
            ("AND", TOK_AND)
            ("OR",  TOK_OR)
            ("XOR", TOK_XOR)
            ;
        prop_.add("status")("expire")("collect")("inhibit");
        edge_.add("LE")("FE");
        unit_.add("ms.")("ns.")("s.");
    }

  private:
    qi::symbols<char, bool>     edge_, prop_, unit_;
    qi::symbols<char, op_token> binop_;

    qi::rule<It, var()> ident_;
    qi::rule<It, tdelay()> tdelay_;
    qi::rule<It, delayed_expr(), Skipper> delay_expr_;
    qi::rule<It, expr(),         Skipper> not_, simple, expr_, neg_expr_;

    qi::rule<It, signal_definition(),  Skipper> signal_def_;
    qi::rule<It, property_assignment(),Skipper> prop_assgn_;
    qi::rule<It, statement(),          Skipper> statement_;
    qi::rule<It, program(),            Skipper> program_;
};

int main()
{
    std::ifstream fin("input.txt");
    std::stringstream buffer;
    buffer << fin.rdbuf();
    std::string input = buffer.str();
    fin.close();

    typedef std::string::const_iterator It;
    typedef qi::rule<It> Skip;
    It f(input.begin()), l(input.end());

    static const Skip skip = qi::space | "--" >> *(qi::char_ - qi::eol) >> qi::eol;
    static const parser<It, Skip> p;

    try
    {
        program result;
        bool ok = qi::phrase_parse(f, l, p, skip, result);

        if (!ok) std::cerr << "invalid input\n";
        else     std::cout << result << "\n";
        if (f!=l)
            std::cerr << "remaining unparsed input: '" << std::string(f,l) << "'\n";
    } 
    catch (const qi::expectation_failure<It>& e)
    {
        std::cerr << "Expectation Failure at '" << std::string(e.first, e.last) << "'" << std::endl;
    }
}

OTHER TIPS

To the isolated question:

Is it possible to use a rule in lexeme, for instance how can I use ident_ to check its property say

    Output_A.status := Signal_4;

Yes. The simplest thing to do with rules, like ident that are 'wholly' lexemes, is to remove the skipper from the rul declaration. Now it becomes:

qi::rule<It, var()> ident_;
qi::rule<It, var(), Skipper> prop_ident_;

ident_   =  char_("a-zA-Z_") >> *char_("a-zA-Z_0-9");       
    prop_ident_ = lexeme[ident_ >> prop_] > ":=" >  ident_;

Without any need to specify lexeme. In other words, rules can implicitly drop a skipper (by not expecting one), but they can't magically instantiate one if one isn't passed (in fact, the Skipper type need not be default constructible)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top