Question

I'm in the progress of writing a compiler for a subset of Java, using boost::spirit, for lexing and parsing. During compilation of the lexer/parser phase, the compiler consumes 1.6GB of RAM (g++ (GCC) 4.8.1), this is not an issue however, as there's plenty of memory on this machine.

What is an issue however, is that when the compiler is done, and the assembler starts running (GNU assembler (GNU Binutils) 2.23.52.20130604), it crashes with;

as: build/src/ast_generate.o: too many sections (33098)
/tmp/cc0ZyvKK.s: Assembler messages:
/tmp/cc0ZyvKK.s: Fatal error: can't write build/src/ast_generate.o: File too big
as: build/src/ast_generate.o: too many sections (33098)
/tmp/cc0ZyvKK.s: Fatal error: can't close build/src/ast_generate.o: File too big
scons: *** [build/src/ast_generate.o] Error 1

Adding '-Os' to my compiler flags, allows the assembler to process the compiler output, but as I see it, it's only a matter of time, until I'll hit the same issue, even with the small optimization flag.

Inspecting, the size optimized object file (ast_generate.o) using objdump, tells me that I'm generating pe-x86-64, which is what I'd expect on Windows.

The 2358 generated sections, is however a shock to me. Mostly as it seems that a section has been generated for each part of the boost::spirit;

 CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_DISCARD
 ...
 60 .pdata$_ZNK5boost5lexer6detail8end_node9unique_idEv 0000000c  0000000000000000  0000000000000000  00030750  2**2
 61 .text$_ZNK5boost5lexer6detail8end_node11lexer_stateEv 00000010  0000000000000000  0000000000000000  0003075c  2**4
 ...

So my questions are;

  1. Is the number in the error (too many sections (X)), the number of sections to be generated, or is it an error code?
  2. Why is a section generated for each data-type?
  3. What can I do, to avoid having to pass '-Os' to my compiler. That is, what can I do to fix the issue, rather than to work around it?
  4. Would splitting the lexer and parse phase into two distinct phases (and compilation units), only connected through a lexer iterator help me out?

Note; I'm compiling using cygwin64.

Was it helpful?

Solution

I've done some hacking here and refactored things a to show the non-runtime-polymorphic style:

I hope it doesn't increase compile times :) (I haven't actually gotten around to splitting the grammar up, but it got smaller).


Features:

  • no more heap allocated AST nodes (not even for trees like expression and/or statement); hence no more explicit cloning and/or spurious const members.
  • I have replaced Maybe.hpp with

    #pragma once
    #include <boost/optional.hpp>
    
    template <typename T> using Maybe = boost::optional<T>;
    

    It's quick-and-dirty, but it all compiles

  • I've replace open type-switching with my own minor effort (I couldn't get it to work; also with boost-variant it's all built in):

    namespace visitor_galore // this is my make-shift replacement for typeswitch (I couldn't find it/make it work)
    {
        template<typename T, class...Fs> struct visitor_t;
    
        template<typename T, class F1, class...Fs>
        struct visitor_t<T, F1, Fs...> : F1, visitor_t<T, Fs...>::type {
            typedef visitor_t type;
            visitor_t(F1 head, Fs...tail) : F1(head), visitor_t<T, Fs...>::type(tail...) {}
    
            using F1::operator();
            using visitor_t<T, Fs...>::type::operator();
        };
    
        template<typename T, class F> struct visitor_t<T, F> : F, boost::static_visitor<T> {
            typedef visitor_t type;
            visitor_t(F f) : F(f) {}
            using F::operator();
        };
    
        template<typename T=void, class...Fs>
        typename visitor_t<T, Fs...>::type make_visitor(Fs...x) { return {x...}; }
    }
    
    using visitor_galore::make_visitor;
    

    To see how this is used, have a look at e.g. ast_pp.cpp:

    void pretty_print(expression_incdec const& exp)
    {
         boost::apply_visitor(
                make_visitor(
                    [&exp](inc_dec_op_preinc const& op)  { std::cout << "++"; pretty_print(exp.variable); }, 
                    [&exp](inc_dec_op_predec const& op)  { std::cout << "--"; pretty_print(exp.variable); }, 
                    [&exp](inc_dec_op_postinc const& op) { pretty_print(exp.variable); std::cout << "++"; }, 
                    [&exp](inc_dec_op_postdec const& op) { pretty_print(exp.variable); std::cout << "--"; }
                    )
                , exp.operatur);
    }
    

    BONUS If you don't care much for listing all types in the branches, e.g. because they all default to calling the same free function (or overloads), you can use a polymorphic visitor:

    static const struct pretty_print_visitor_ : boost::static_visitor<>
    {
        template<typename T>
        void operator ()(T const& v) const { pretty_print(v); }
    } pretty_print_visitor;
    

    E.g. now you can replace the 24 branches for expression&:

    boost::apply_visitor(
            make_visitor(
                [](expression_binop const& exp)              { pretty_print(exp); }, 
                [](expression_unop const& exp)               { pretty_print(exp); }, 
                [](expression_integer_constant const& exp)   { pretty_print(exp); }, 
                [](expression_character_constant const& exp) { pretty_print(exp); }, 
                [](expression_string_constant const& exp)    { pretty_print(exp); }, 
                [](expression_boolean_constant const& exp)   { pretty_print(exp); }, 
                [](expression_null const& exp)               { pretty_print(exp); }, 
                [](expression_this const& exp)               { pretty_print(exp); }, 
                [](expression_static_invoke const& exp)      { pretty_print(exp); }, 
                [](expression_non_static_invoke const& exp)  { pretty_print(exp); }, 
                [](expression_simple_invoke const& exp)      { pretty_print(exp); }, 
                [](expression_ambiguous_invoke const& exp)   { pretty_print(exp); }, 
                [](expression_new const& exp)                { pretty_print(exp); }, 
                [](expression_new_array const& exp)          { pretty_print(exp); }, 
                [](expression_lvalue const& exp)             { pretty_print(exp); }, 
                [](expression_assignment const& exp)         { pretty_print(exp); }, 
                [](expression_incdec const& exp)             { pretty_print(exp); }, 
                [](expression_cast const& exp)               { pretty_print(exp); }, 
                [](expression_ambiguous_cast const& exp)     { pretty_print(exp); }, 
                [](expression_instance_of const& exp)        { pretty_print(exp); }, 
                [](expression_parentheses const& exp)        { pretty_print(exp); },
                [](lvalue_non_static_field const& exp)       { pretty_print(exp); },
                [](lvalue_array const& exp)                  { pretty_print(exp); },
                [](lvalue_ambiguous_name const& exp)         { pretty_print(exp); }
           )
            , exp);
    

    by a simple

    boost::apply_visitor(pretty_print_visitor, exp);
    
  • Note a few occasions where I've put // TODO or // FIXME comments (notable with concat, which didn't quite want to compile for me anymore).

  • Note that the Ast classes got noticeably simpler (especially more trivally correct regarding memory allocations)

  • Note that the Parser itself shrunk due to the reduced need for semantic actions and Phoenix adapted functions

  • Note that I opted to forget about LexerPosition information for now (that used to be 'hidden' in the base classes, now gone). There is a compiler tutorial example that shows how to use qi::on_error(qi::success, ...) to very elegantly attach source location information to selected Ast nodes (non-intrusively).

  • Instead of the various predicates in ast_helpers I'd anticipate that there could be a number of helpful traits-based predicates (e.g. is_lvalue or is_true_const). I've elected to "keep" the helpers more or less as-is (which may be totally wrong, I haven't tested anything).

  • I've pervasively tried to replace parameter passing by value to passing by const& (compare e.g. the ast_pp.hpp) but I'm aware I've left some spots behind because the task was big enough as it was.

GIANT DISCLAIMER: I've probably broken the parser in various ways. I haven't tried to parse anything with it. The edits are provided as is and without any claim to usefulness. I've solved similar problems in dissimilar ways (once a traits::tranform_attribute<> specialization, once a largish semantic action with at_c<>, and some other approaches) :

The goal was to show you what I had in mind when I mentioned maybe

  • reducing dynamic polymorphism significantly,
  • avoiding semantic actions
  • embracing boost constructs where possible to get more 'automagic' integration with spirit
  • showing various ideas you can pick your favorite(s) from :/

OTHER TIPS

Try

  • splitting it up in different translation units
  • disabling debug information (frequently, this is what trips up with large file sizes, because debug information gets emitted just like other object data)
  • disabling rtti (last resort)

Turning optimizations on (-O1 flag) solved the problem for me.

Try adding -Wa,-mbig-obj to your CXX_FLAGS. This will work with new enough gcc.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top