How to achieve proper parsing when some of structure' fields are omitted or are in not the same order as in the structure declaration?

StackOverflow https://stackoverflow.com/questions/19663539

Question

So I have a parser that parses string like 7.5*[someAlphanumStr] or 7.5[someAlphanumStr] into this struct:

struct summand {
    float factor;
    std::string name;
    summand(const float & f):factor(f), name(""){}
    summand(const std::string & n):factor(1.0f), name(n){}
    summand(const float & f, const std::string & n):factor(f), name(n){}
    summand():factor(0.0f), name(""){}
};

but in addition i need to be able parse strings like [someAlphanumStr]*7.4, [someAlphanumStr]5, 7.4 and [someAlphanumStr]. In the last two cases(7.4 and [someAlphanumStr]) i want to set values for fields which are omitted into default values and for this sake i have written for my struct summand constructors with one argument.

Below is my code and result which it produces:

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>

#include <iostream>
#include <string>
#include <vector>

namespace client
{
    namespace spirit = boost::spirit;
    namespace qi     = boost::spirit::qi;
    namespace ascii  = boost::spirit::ascii;

    struct summand {
        float factor;
        std::string name;
        summand(const float & f):factor(f), name(""){}
        summand(const std::string & n):factor(1.0f), name(n){}
        summand(const float & f, const std::string & n):factor(f), name(n){}
        summand():factor(0.0f), name(""){}
    };
}

BOOST_FUSION_ADAPT_STRUCT(client::summand,
                      (float, factor)
                      (std::string, name)
                      )

namespace client {

    template <typename Iterator>
    struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
    {
        summand_parser() : summand_parser::base_type(summand_rule)
        {
            using namespace ascii;

            summand_rule %= (qi::float_ >> -qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']')|('[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -qi::lit('*') >> qi::float_)|(qi::float_)|('[' >> qi::lexeme[alpha >> *alnum] >> ']');

        }

        qi::rule<Iterator, summand(), ascii::space_type> summand_rule;
    };
}

void parseSummandsInto(std::string const& str, client::summand& summands)
{
    typedef std::string::const_iterator It;
    static const client::summand_parser<It> g;

    It iter = str.begin(),
    end = str.end();

    bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summands);

    if (r && iter == end)
        return;
    else
        throw "Parse failed";
}

int main()
{
    std::vector<std::string> inputStrings = {"7.5*[someAlphanumStr]", "7.5[someAlphanumStr]", "[someAlphanumStr]*7.4", "[someAlphanumStr]5", "7.4", "[someAlphanumStr]"};

    std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string & inputStr) {
        client::summand parsed;
        parseSummandsInto(inputStr, parsed);
        std::cout << inputStr << " -> " << boost::fusion::as_vector(parsed) << std::endl;
    });
}

results (Coliru):

+ clang++ -std=c++11 -O0 -Wall -pedantic main.cpp
+ ./a.out
+ c++filt -t
7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (115 )
[someAlphanumStr]5 -> (115 )
7.4 -> (7.4 )
[someAlphanumStr] -> (115 omeAlphanumStr)

Thanks to all for clear answers and advices and especially I'm grateful to @sehe.

Was it helpful?

Solution

The way to get anything done with Spirit[1] is to use small steps, simplify rigorously along the way.

Don't live with "cruft" (like, randomly repeated sub expressions). Also, being explicit is good. In this case, I'd start with extracting the repeated sub-expressions and reformatting for legibility:

    name_rule   = '[' >> qi::lexeme[alpha >> *alnum] >> ']';
    factor_rule = qi::float_;

    summand_rule %= 
          (factor_rule >> -qi::lit('*') >> name_rule)
        | (name_rule   >> -qi::lit('*') >> factor_rule)
        | (factor_rule)
        | (name_rule)
        ;

There, much better already, and I haven't changed a thing. But wait! It doesn't compile anymore

    qi::rule<Iterator, std::string(), ascii::space_type> name_rule;
    qi::rule<Iterator, float(),       ascii::space_type> factor_rule;

It turns out that the grammar only "happened" to compile because Spirit's Attribute compatibility rules are so lax/permissive that the characters matched for the name were just being assigned to the factor part (that's where 115 came from: 0x73 is ASCII for s from someAlphanumStr).


OOPS/TL;DW I had quite a lenghty analysis write up here, once, but I clobbered it by closing my browser and SO had only an old draft cached server-side :( I'll boil it down to the bottomline now:

Guideline Use either constructor overloads to assign to your exposed attribute type, or use Fusion Sequence adaptation, but don't mix the two: they will interfere in surprising/annoying ways.

Don't worry, I won't let you go empty handed, of course. I'd just 'manually' direct the factor and name components in their respective 'slots' (members)[2].

Inherited attributes are a sweet way to have keep this legible and convenient:

// assuming the above rules redefined to take ("inherit") a summand& attribute:
qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;

Just add a simple assignment in the semantic action:

name_rule   = as_string [ '[' >> lexeme[alpha >> *alnum] >> ']' ] 
                        [ _name   = _1 ];
factor_rule = double_   [ _factor = _1 ];

Now, the 'magic dust' is of course in how the _name and _factor actors are defined. I prefer using binds for this, over phx::at_c<N> due to maintenance costs:

static const auto _factor = phx::bind(&summand::factor, qi::_r1);
static const auto _name   = phx::bind(&summand::name,   qi::_r1);

See? That's pretty succinct and clearly shows what is happening. Also, there's no actual need to have Fusion adaptation for summand here.

Now, finally, we can simplify the main rule as well:

    summand_rule = 
              factor_rule (_val) >> - ( -lit('*') >> name_rule   (_val) )
            | name_rule   (_val) >> - ( -lit('*') >> factor_rule (_val) )
        ;

What this does, is simply combine the single-component branches into the dual-component branches by making the trailing part optional.

Note how the summand default constructor takes care of the default values:

struct summand {
    float factor;
    std::string name;

    summand() : factor(1.f), name("") {}
};

Notice how this removed quite some complexity there.

See the fully adapted sample running Live on Coliru which prints:

7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (7.4 someAlphanumStr)
[someAlphanumStr]5 -> (5 someAlphanumStr)
7.4 -> (7.4 )
[someAlphanumStr] -> (1 someAlphanumStr)

Full Code Listing

#define BOOST_SPIRIT_USE_PHOENIX_V3
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace client {
    namespace qi     = boost::spirit::qi;
    namespace phx    = boost::phoenix;
    namespace ascii  = boost::spirit::ascii;

    struct summand {
        float factor;
        std::string name;

        summand() : factor(1.f), name("") {}
    };
}

namespace client {

    template <typename Iterator>
    struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
    {
        summand_parser() : summand_parser::base_type(summand_rule)
        {
            using namespace ascii;

            static const auto _factor = phx::bind(&summand::factor, qi::_r1);
            static const auto _name   = phx::bind(&summand::name,   qi::_r1);

            name_rule   = qi::as_string [ '[' >> qi::lexeme[alpha >> *alnum] >> ']' ] 
                                          [ _name   = qi::_1 ] ;
            factor_rule = qi::double_     [ _factor = qi::_1 ] ;

            summand_rule = 
                      factor_rule (qi::_val) >> - ( -qi::lit('*') >> name_rule   (qi::_val) )
                    | name_rule   (qi::_val) >> - ( -qi::lit('*') >> factor_rule (qi::_val) )
                ;

            BOOST_SPIRIT_DEBUG_NODES((summand_rule)(name_rule)(factor_rule))
        }

        qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;
        qi::rule<Iterator, summand(),      ascii::space_type> summand_rule;
    };
}

bool parseSummandsInto(std::string const& str, client::summand& summand)
{
    typedef std::string::const_iterator It;
    static const client::summand_parser<It> g;

    It iter(str.begin()), end(str.end());
    bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summand);

    return (r && iter == end);
}

int main()
{
    std::vector<std::string> inputStrings = {
        "7.5*[someAlphanumStr]",
        "7.5[someAlphanumStr]",
        "[someAlphanumStr]*7.4",
        "[someAlphanumStr]5",
        "7.4",
        "[someAlphanumStr]",
    };

    std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string const& inputStr) {
        client::summand parsed;
        if (parseSummandsInto(inputStr, parsed))
            std::cout << inputStr << " -> (" << parsed.factor << " " << parsed.name << ")\n";
        else
            std::cout << inputStr << " -> FAILED\n";
    });
}

[1] And arguably, anything else in technology

[2] You can keep the FUSION_ADAPT_STRUCT but it's no longer required as you can see

OTHER TIPS

I'm not sure whether this is the best solution, but I'd solve this by providing initial values for the fusion sequence and than modifying them later with Phoenix:

summand_rule %=
    (qi::float_ >> -(-qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']'))
  | (qi::attr(0.) >> '[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -(-qi::lit('*') >> qi::float_[ph::at_c<0>(qi::_val) = qi::_1]));

That is, we're giving an initial value of 0. to the first item in the fusion sequence, which gets assigned to factor, and then going back and modifying it later.

If we omit the factor in the reversed case, the attribute type of the rule will exactly model summand and we can use = assignment instead of %=:

summand_rule =
    (qi::float_ >> -(-qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']'))
  | (qi::attr(0.) >> '[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -(-qi::lit('*') >> qi::omit[qi::float_[ph::at_c<0>(qi::_val) = qi::_1]]));

Demo: http://coliru.stacked-crooked.com/a/46e3e8101a9c10ea

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top