Providing different implementations of a class depending on lvalue/rvalue when using expression templates

StackOverflow https://stackoverflow.com/questions/17796718

Domanda

The problem

Suppose we implement a string class which represents, uhm, strings. We then want to add an operator+ which concatenates two strings, and decide to implement that via expression templates to avoid multiple allocations when doing str1 + str2 + ... + strN.

The operator will look like this:

stringbuilder<string, string> operator+(const string &a, const string &b)

stringbuilder is a template class, which in turn overloads operator+ and has an implicit string conversion operator. Pretty much the standard textbook exercise:

template<class T, class U> class stringbuilder;

template<> class stringbuilder<string, string> {
    stringbuilder(const string &a, const string &b) : a(a), b(b) {};
    const string &a;
    const string &b;
    operator string() const;
    // ...
}

// recursive case similar,
// building a stringbuilder<stringbuilder<...>, string>

The above implementation works perfectly as long as someone does

string result = str1 + str2 + ... + strN;

However, it has a subtle bug. Assigning the result to a variable of the right type will make that variable hold references to all the strings that compose the expression. That means, for instance, that changing one of the strings will change the result:

void print(string);
string str1 = "foo";
string str2 = "bar";
right_type result = str1 + str2;
str1 = "fie";
print(result); 

This will print fiebar, because of the str1 reference stored inside the expression template. It gets worse:

string f();
right_type result = str1 + f();
print(result); // kaboom

Now the expression template will contain a reference to a destroyed value, crashing your program straight away.

Now what's that right_type? It is of course stringbuilder<stringbuilder<...>, string>, i.e. the type the expression template magic is generating for us.

Now why would one use a hidden type like that? In fact, one doesn't use it explicitely -- but C++11's auto does!

auto result = str1 + str2 + ... + strN; // guess what's going on here?

The question

The bottom line is: it seems that this way of implementing expression templates (by storing cheap references instead of copying values or using shared pointers) gets broken as soon as one tries to store the expression template itself.

Therefore, I'd pretty much like a way of detecting if I'm building a rvalue or a lvalue, and provide different implementations of the expression template depending on whether a rvalue is built (keep references) or a lvalue is built (make copies).

Is there an estabilished design pattern to handle this situation?

The only things I was able to figure out during my research were that

  1. One can overload member functions depending on this being an lvalue or rvalue, i.e.

    class C {
        void f() &; 
        void f() &&; // called on temporaries
    }
    

    however, it seems I can't do that on constructors as well.

  2. In C++ one cannot really do ``type overloads'', i.e. offer multiple implementations of the same type, depending on how the type is going to be used (instances created as lvalues or rvalues).

È stato utile?

Soluzione

I started this in a comment but it was a bit big for that. Then, let's make it an answer (even though it doens't really answer your question).

This is a known issue with auto. For instance, it has been discussed by Herb Sutter here and in more details by Motti Lanzkron here.

As they say, there were discussions in the committee to add operator auto to C++ to tackle this problem. The idea would be instead of (or in addition to) providing

operator string() const;

as you mentioned, one would provide

string operator auto() const;

to be used in type deduction contexts. In this case,

auto result = str1 + str2 + ... + strN;

would not deduce the type of result to be the "right type" but rather the type string because that's what operator auto() returns.

AFAICT this is not going to happen in C++14. C++17 pehaps...

Altri suggerimenti

Elaborating on a comment I made to the OP; example:

This only tackles the problem of assigning to either an object or binding to a reference and afterwards converting to a destination type. It is not a comprehensive fix the the problem (also see Yakk's response to my comment), but it prevents the scenario presented in the OP and makes it generally harder to write this kind of error-prone code.

Edit: It might not be possible to expand this approach for class templates (more specifically, the specialization of std::move). Macro'ing could work for this specific problem, but is obviously ugly. Overloading std::move would rely on UB.

#include <utility>
#include <cassert>

// your stringbuilder class
struct wup
{
    // only use member functions with rvalue-ref-qualifier
    // this way, no lvalues of this class can be used
    operator int() &&
    {
        return 42;
    }
};

// specialize `std::move` to "prevent" from converting lvalues to rvalue refs
// (make it much harder and more explicit)
namespace std
{
    template<> wup&& move(wup&) noexcept
    {
        assert(false && "Do not use `auto` with this expression!");
    }
    // alternatively: no function body -> linker error
}

int main()
{
    auto obj = wup{};
    auto& lref = obj;
    auto const& clref = wup{};
    auto&& rref = wup{};

    // fail because of conversion operator
      int iObj = obj;
      int iLref = lref;
      int iClref = clref;
      int iRref = rref;
      int iClref_mv = std::move(clref);

    // assert because of move specialization
      int iObj_mv = std::move(obj);
      int iLref_mv = std::move(lref);
      int iRref_mv = std::move(rref);

    // works
    int i = wup{};
}

Just a wild idea (haven't tried it):

template<class T, class U>
class stringbuilder
{
  stringbuilder(stringbuilder const &) = delete;
}

wouldn't force compilation error?

A possible approach would be using the null object pattern. While it might make your string builder bigger, it will still avoid the memory allocations.

template <>
class stringbuilder<std::string,std::string> {
   std::string        lhs_value;
   std::string        rhs_value;
   const std::string& lhs;
   const std::string& rhs;

   stringbuilder(const std::string &lhs, const std::string &rhs) 
      : lhs(lhs), rhs(rhs) {}

   stringbuilder(std::string&& lhs, const std::string &rhs) 
      : lhs_value(std::move(lhs)), lhs(lhs_value), rhs(rhs) {}

   stringbuilder(const std::string& lhs, std::string&& rhs)
      : rhs_value(std::move(rhs)), lhs(lhs), rhs(rhs_value) {}

   stringbuilder(std::string&& lhs, std::string&& rhs)
      : lhs_value(std::move(lhs)), rhs_value(std::move(rhs)),
        lhs(lhs_value), rhs(rhs_value) {}
//...

If the argument to the constructor is an lvalue, then you store a reference to the real object. If the argument to the constructor is an rvalue, you can move that into an internal variable with almost no cost (move operations are cheap) and store a reference to that internal object. The rest of the code can access the reference knowing (well, at least hoping) that the string will still be alive.

The hoping part is because there is nothing blocking misuse if an lvalue is passed but the object is destroyed before the stringbuilder completes its job.

Here is another attempt at solving the issue of dangling references. It doesn't solve the issue of references to things that are modified though.

The idea is to store the temporaries into values, but to have references to lvalues (that we can expect to keep living after the ;).

// Temporary => store a copy
// Otherwise, store a reference
template <typename T>
using URefUnlessTemporary_t
= std::conditional_t<std::is_rvalue_reference<T&&>::value
,                    std::decay_t<T>
,                    T&&>
;

template <typename LHS, typename RHS>
struct StringExpression
{
    StringExpression(StringExpression const&) = delete;
    StringExpression(StringExpression     &&) = default;

    constexpr StringExpression(LHS && lhs_, RHS && rhs_)
        : lhs(std::forward<LHS>(lhs_))
        , rhs(std::forward<RHS>(rhs_))
        { }

    explicit operator std::string() const
    {
        auto const len = size(*this);
        std::string res;
        res.reserve(len);
        append(res, *this);
        return res;
    }

    friend constexpr std::size_t size(StringExpression const& se)
    {
        return size(se.lhs) + size(se.rhs);
    }


    friend void append(std::string & s, StringExpression const& se)
    {
        append(s, se.lhs);
        append(s, se.rhs);
    }

    friend std::ostream & operator<<(std::ostream & os, const StringExpression & se)
    { return os << se.lhs << se.rhs; }

private:
    URefUnlessTemporary_t<LHS> lhs;
    URefUnlessTemporary_t<RHS> rhs;
};

template <typename LHS, typename RHS>
StringExpression<LHS&&,RHS&&> operator+(LHS && lhs, RHS && rhs)
{
    return StringExpression<LHS&&,RHS&&>{std::forward<LHS>(lhs), std::forward<RHS>(rhs) };
}

I've no doubt this could be simplified.

int main ()
{
    constexpr static auto c = exp::concatenator{};
    {
        std::cout << "RVREF\n";
        auto r = c + f() + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }

    {
        std::cout << "\n\nLVREF\n";
        std::string str="lvref";
        auto r = c + str + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }

    {
        std::cout << "\n\nCLVREF\n";
        std::string const str="clvref";
        auto r = c + str + "toto";
        std::cout << r << "\n";
        std::string s (r);
        std::cout << s << "\n";
    }
}

NB: I don't provide size(), append() nor concatenator, they aren't the points where the difficulties lie.

PS: I've used C++14 only to simplify the type traits.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top