Question

I was wondering about a possible way to make memory layout of a class to be more effective in templated code. As far as I know, Standard mandates data members of a class to be laid out in memory on order of their declaration. There might be possible padding done by the compiler to align the data members adding unnecessarily to the size of the class. The idea is to re-arrange data members declarations at compile time to minimize such padding. I did some searching, but couldn't find any info (most of the time people discuss packing compiler directives, which is not quite the same as I see it).

First, please consider the following (trivial, but repetitive and ugly) code (same code on ideone.com) (questions are below the code, feel free to skip right down to them):

#include <iostream>
#include <cstdint>

namespace so
{
template <typename Ta, typename Tb, typename Tc, std::size_t =
    ((sizeof(Ta) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Tc))) ? 10 :
    ((sizeof(Ta) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Tb))) ? 11 :
    ((sizeof(Tb) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tc))) ? 20 :
    ((sizeof(Tb) >= sizeof(Tc)) && (sizeof(Tc) >= sizeof(Ta))) ? 21 :
    ((sizeof(Tc) >= sizeof(Ta)) && (sizeof(Ta) >= sizeof(Tb))) ? 30 :
    ((sizeof(Tc) >= sizeof(Tb)) && (sizeof(Tb) >= sizeof(Ta))) ? 31 : 0>
struct foo {};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 10>
{
  Ta a;
  Tb b;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 11>
{
  Ta a;
  Tc c;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : a{_a}, c{_c}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 20>
{
  Tb b;
  Ta a;
  Tc c;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, a{_a}, c{_c} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 21>
{
  Tb b;
  Tc c;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : b{_b}, c{_c}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 30>
{
  Tc c;
  Ta a;
  Tb b;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, a{_a}, b{_b} {}
};

template <typename Ta, typename Tb, typename Tc>
struct foo<Ta, Tb, Tc, 31>
{
  Tc c;
  Tb b;
  Ta a;
  foo(Ta _a, Tb _b, Tc _c) : c{_c}, b{_b}, a{_a} {}
};

template <typename Ta, typename Tb, typename Tc>
struct bar: public foo<Ta, Tb, Tc>
{
 private:
  using base = foo<Ta, Tb, Tc>;
 public:
  bar() = default;
  using base::base;
};

template <typename Ta, typename Tb, typename Tc>
struct foobar
{
  Ta a;
  Tb b;
  Tc c;
  foobar() = default;
  foobar(Ta _a, Tb _b, Tc _c) : a{_a}, b{_b}, c{_c} {}
};
} //namespace so

int main()
{
 so::bar<std::uint16_t, std::uint32_t, std::uint16_t> bar{1, 2, 3};
 so::foobar<std::uint16_t, std::uint32_t, std::uint16_t> foobar{1, 2, 3};

 std::cout << sizeof(bar) << "\t" << sizeof(foobar) << std::endl;

 std::cout << bar.a << " : " << bar.b << " : " << bar.c << std::endl;
 std::cout << foobar.a << " : " << foobar.b << " : " << foobar.c << std::endl;

 return (0);
}

Program output:

8   12
1 : 2 : 3
1 : 2 : 3

Questions:

  1. Is there some well-known, compiler-independent way of solving such thing (Boost, maybe)?
  2. If no, is there some compiler-specific directives that will do such thing automatically (without data misalignment as with GCC's __atribute__((packed)))?
  3. Can this be done in a more generic way (possibly using variadic templates)?

Thanks in advance!

Was it helpful?

Solution

I believe I have a relatively simple variadic template solution.

The implementation will require a couple helpers though, so I will present it backward so you can get the gist of it first.

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = /**/; // pairs of sorted Args (by decreasing size)
                          // and their original index in the argument list

    using Storage = /*std::tuple<Indexed.first ...>*/;

    template <size_t I>
    using Index = /*index of element of Indexed whose .second is I*/;

    Storage _storage;
}; // class OptimizedLayout

The main benefit here is that changing how to pack elements will only affect how Indexed is defined, so you can easily improve on the algorithm. For now, I will just provide an equivalent of your template.


Disclaimer: the following code is untested, it may not even compile, let alone produce the correct result.

I. Generating indices.

The explanation can be found on the lounge, we can reuse it to generate a pack of pairs (type, index). To sort it we are going to use a MPL algorithm, so it is simpler to produce the pack as a MPL vector.

template <std::size_t... Is>
struct indices {};

template <std::size_t N, std::size_t... Is>
struct build_indices
  : build_indices<N-1, N-1, Is...> {};

template <std::size_t... Is>
struct build_indices<0, Is...> { using type = indices<Is...>; };

template <typename Tuple, typename Indices>
struct IndexedImpl;

template <typename... T, size_t... I>
struct IndexedImpl< std::tuple<T...>, indices<I...> > {
    using type = boost::mpl::vector< std::pair<T, I>... >;
};

template <typename Tuple>
using Indexed =
    IndexedImpl<Tuple, typename build_indices<std::tuple_size<Tuple>::value>::type>;

II. Sorting

In order to sort we are going to use the MPL sort algorithm, which operates on types.

struct GreaterSize {
    template <typename T, typename U>
    struct apply {
         using type = boost::mpl::bool_<sizeof(T) > sizeof(U)>;
    };
};

template <typename T>
struct TupleInserter {
    using state = T;

    template <typename Seq, typename E>
    struct apply;
};

template <typename T>
template <typename... Args, typename E>
struct TupleInserter<T>::apply<std::tuple<Args...>, E> {
    using type = std::tuple<Args..., E>;
};

template <typename Tuple>
using SortedSequence = boost::mpl::sort<
    typename Indexed<Tuple>::type,
    GreaterSize,
    TupleInserter
>;

III. Computing the Storage class

Now, we only need to compute the storage class which is done by extracting the first element of each pair. Interestingly, pattern matching can really assist here.

template <typename T>
struct TupleFirstExtractor;

template <typename... T, size_t... I>
struct TupleFirstExtractor<std::tuple<std::pair<T, I>...>> {
    using type = std::tuple<T...>;
}; 

IV. Computing the Index solver

template <typename Tuple, size_t Needle, size_t Acc>
struct IndexFinderImpl;

template <typename H, size_t h, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H,h>, Tail...>, Needle, Acc>:
    IndexFinderImpl<std::tuple<Tail...>, Needle, Acc+1> {};

template <typename H, typename... Tail, size_t Needle, size_t Acc>
struct IndexFinderImpl<std::tuple<std::pair<H, Needle>, Tail...>, Needle, Acc>:
    std::integral_constant<size_t, Acc> {};

V. Putting it all together

And now we wire up everything:

template <typename... Args>
class OptimizedLayout {
public:
    template <size_t I>
    auto at() -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

    template <size_t I>
    auto at() const -> decltype(std::get<Index<I>::value>(_storage)) {
        return std::get<Index<I>::value>(_storage);
    }

private:
    using Indexed = typename SortedSequence<std::tuple<Args...>>::type;

    using Storage = typename TupleFirstExtractor<Indexed>::type;

    template <size_t I>
    using Index = IndexFinderImpl<Indexed, I, 0>;

    Storage _storage;
}; // class OptimizedLayout

Hint: I advise using a specialized namespace to hold all the helpers. Whilst they could be defined within the template, it is easier to define them outside since they do not depend on Args..., however you would need to isolate them to avoid clashes with other parts of your program.

OTHER TIPS

Take a look at this series of blog posts by R. Martinho Fernandes : Link.

It outlines optimal packing of tuples. You can use such a "packed" tuple as the data storage for you class, and provide accessors hiding the get<0>() style tuple element access.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top