Question

In several questions I've seen recommendations for the Spirit parser-generator framework from boost.org, but then in the comments there is grumbling from people using Spirit who are not happy. Will those people please stand forth and explain to the rest of us what are the drawbacks or downsides to using Spirit?

Was it helpful?

Solution

It is a quite cool idea, and I liked it; it was especially useful to really learn how to use C++ templates.

But their documentation recommends the usage of spirit for small to medium-size parsers. A parser for a full language would take ages to compile. I will list three reasons.

  • Scannerless parsing. While it's quite simpler, when backtracking is required it may slow down the parser. It's optional though - a lexer might be integrated, see the C preprocessor built with Spirit. A grammar of ~300 lines (including both .h and .cpp files) compiles (unoptimized) to a file of 6M with GCC. Inlining and maximum optimizations gets that down to ~1,7M.

  • Slow parsing - there is no static checking of the grammar, neither to hint about excessive lookahead required, nor to verify basic errors, such as for instance usage of left recursion (which leads to infinite recursion in recursive-descent parsers LL grammars). Left recursion is not a really hard bug to track down, though, but excessive lookahead might cause exponential parsing times.

  • Heavy template usage - while this has certain advantages, this impacts compilation times and code size. Additionally, the grammar definition must normally be visible to all other users, impacting even more compilation times. I've been able to move grammars to .cpp files by adding explicit template instantiations with the right parameters, but it was not easy.

UPDATE: my response is limited to my experience with Spirit classic, not Spirit V2. I would still expect Spirit to be heavily template-based, but now I'm just guessing.

OTHER TIPS

In boost 1.41 a new version of Spirit is being released, and it beats of pants off of spirit::classic:

After a long time in beta (more than 2 years with Spirit 2.0), Spirit 2.1 will finally be released with the upcoming Boost 1.41 release. The code is very stable now and is ready for production code. We are working hard on finishing the documentation in time for Boost 1.41. You can peek at the current state of the documentation here. Currently, you can find the code and documentation in the Boost SVN trunk. If you have a new project involving Spirit, we highly recommend starting with Spirit 2.1 now. Allow me to quote OvermindDL's post from the Spirit mailing list:

I may start to sound like a bot with how often I say this, but Spirit.Classic is ancient, you should switch to Spirit2.1, it can do everything you did above a GREAT deal easier, a lot less code, and it executes faster. For example, Spirit2.1 can build your entire AST inline, no weird overriding, no need to build things up afterwards, etc..., all as one nice and fast step. You really need to update. See the other posts from the past day for links to docs and such for Spirit2.1. Spirit2.1 is currently in Boost Trunk, but will be formally released with Boost 1.41, but is otherwise complete.

For me, the biggest problem is that expressions in Spirit, as seen by compiler or debugger, are rather long (I copied below a part of one expression in Spirit Classic). These expressions scare me. When I work on a program that uses Spirit, I'm afraid to use valgrind or to print backtrace in gdb.

boost::spirit::classic::parser_result<boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<optional_suffix_parser<char const*>, boost::spirit::classic::ref_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::ref_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::clear_action> >, boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > >, boost::spirit::classic::kleene_star<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::action<boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::alternative<boost::spirit::classic::chlit<char>, boost::spirit::classic::chlit<char> >, boost::spirit::classic::positive<boost::spirit::classic::alternative<boost::spirit::classic::alternative<boost::spirit::classic::alnum_parser, boost::spirit::classic::chlit<char> >, boost::spirit::classic::chlit<char> > > > >, boost::spirit::classic::ref_value_actor<std::vector<std::string, std::allocator<std::string> >, boost::spirit::classic::push_back_action> >, boost::spirit::classic::action<boost::spirit::classic::rule<boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> >, boost::spirit::classic::nil_t, boost::spirit::classic::nil_t>, boost::spirit::classic::ref_const_ref_actor<std::vector<std::string, std::allocator<std::string> >, std::string, boost::spirit::classic::push_back_action> > >, boost::spirit::classic::contiguous<boost::spirit::classic::sequence<boost::spirit::classic::chlit<char>, boost::spirit::classic::action<boost::spirit::classic::uint_parser<unsigned int, 10, 1u, -1>, boost::spirit::classic::ref_value_actor<std::vector<int, std::allocator<int> >, boost::spirit::classic::push_back_action> > > > > > > > >, void ()(char const, char const*)>, boost::spirit::classic::scanner<char const*, boost::spirit::classic::scanner_policies<boost::spirit::classic::skipper_iteration_policy<boost::spirit::classic::iteration_policy>, boost::spirit::classic::match_policy, boost::spirit::classic::action_policy> > >::type boost::spirit::classic::action<boost::spirit::classic::sequence<boost::spirit::classic::action<boost::spirit::classic::action<

Here is what I don't like about it:

  • the documentation is limited. There is one big web page where "everything" is explained, but the current explanations lack in details.

  • poor AST generation. ASTs are poorly explained and, even after hitting your head against the wall to understand how the AST modifiers work, it's difficult to obtain an easy to manipulate AST (i.e. one that maps well to the problem domain)

  • It increases compilation times enormously, even for "medium"-sized grammars

  • Syntax is too heavyweight. It is a fact of life that in C/C++ you must duplicate code (i.e. between declaration and definition). However, it seems that in boost::spirit, when you declare a grammar<>, you must repeat some things 3 times :D (when you want ASTs, which is what I want :D)

Other than this, I think they did a pretty good job with the parser, given the limitations of C++. But I think they should improve it more. The history page describes that there was a "dynamic" spirit before the current "static" spirit; I'm wondering how much faster and how much better syntax it had.

I would say the biggest problem is the lack of any diagnosis or other help for grammar problems. If your grammar is ambiguous, the parser might not parse what you expect it to, and there's no good way of noticing that.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top