Question

I have a function that takes a number and returns up to that many things (say, ints). What's the cleanest interface? Some thoughts:

  1. Return a vector<int>. The vector would be copied several times, which is inefficient.
  2. Return a vector<int>*. My getter now has to allocate the vector itself, as well as the elements. There are all the usual problems of who has to free the vector, the fact that you can't allocate once and use the same storage for many different calls to the getter, etc. This is why STL algorithms typically avoid allocating memory, instead wanting it passed in.
  3. Return a unique_ptr<vector<int>>. It's now clear who deletes it, but we still have the other problems.
  4. Take a vector<int> as a reference parameter. The getter can push_back() and the caller can decide whether to reserve() the space. However, what should the getter do if the passed-in vector is non-empty? Append? Overwrite by clearing it first? Assert that it's empty? It would be nice if the signature of the function allowed only a single interpretation.
  5. Pass a begin and end iterator. Now we need to return the number of items actually written (which might be smaller than desired), and the caller needs to be careful not to access items that were never written to.
  6. Have the getter take an iterator, and the caller can pass an insert_iterator.
  7. Give up and just pass a char *. :)
Was it helpful?

Solution

In C++11, where move semantics is supported for standard containers, you should go with option 1.

It makes the signature of your function clear, communicating that you just want a vector of integers to be returned, and it will be efficient, because no copy will be issued: the move constructor of std::vector will be invoked (or, most likely, Named Return Value Optimization will be applied, resulting in no move and no copy):

std::vector<int> foo()
{
    std::vector<int> v;
    // Fill in v...
    return v;
}

This way you won't have to deal with issues such as ownership, unnecessary dynamic allocations, and other stuff which are just polluting the simplicity of your problem: returning a bunch of integers.

In C++03, you may want to go with option 4 and take an lvalue reference to a non-const vector: standard containers in C++03 are not move-aware, and copying a vector may be expensive. Thus:

void foo(std::vector<int>& v)
{
    // Fill in v...
}

However, even in that case, you should consider whether this penalty is really significant for your use cases. If it is not, you may well opt for a clearer function signature at the expense of some CPU cycles.

Also, C++03 compilers are capable of performing Named Return Value Optimization, so even though in theory a temporary should be copy-constructed from the value you return, in practice no copying is likely to happen.

OTHER TIPS

You wrote it yourself:

... This is why STL algorithms typically avoid allocating memory, instead wanting it passed in

except that STL algorithms don't typically "want memory passed in", they operate on iterators instead. This is specifically to decouple the algorithm from the container, giving rise to:

option 8

decouple the value generation from both the use and storage of those values, by returning an input iterator.

The easiest way is using boost::function_input_iterator, but a sketch mechanism is below (mostly because I was typing faster than thinking).


Input iterator type

(uses C++11, but you can replace the std::function with a function pointer or just hard-code the generation logic):

#include <functional>
#include <iterator>
template <typename T>
class Generator: public std::iterator<std::input_iterator_tag, T> {
    int count_;
    std::function<T()> generate_;
public:
    Generator() : count_(0) {}
    Generator(int count, std::function<T()> func) : count_(count)
                                                  , generate_(func) {}
    Generator(Generator const &other) : count_(other.count_)
                                      , generate_(other.generate_) {}
    // move, assignment etc. etc. omitted for brevity
    T operator*() { return generate_(); }
    Generator<T>& operator++() {
        --count_;
        return *this;
    }
    Generator<T> operator++(int) {
        Generator<T> tmp(*this);
        ++*this;
        return tmp;
    }
    bool operator==(Generator<T> const &other) const {
        return count_ == other.count_;
    }
    bool operator!=(Generator<T> const &other) const {
        return !(*this == other);
    }
};

Example generator function

(again, it's trivial to replace the lambda with an out-of-line function for C++98, but this is less typing)

#include <random>
Generator<int> begin_random_integers(int n) {
    static std::minstd_rand prng;
    static std::uniform_int_distribution<int> pdf;
    Generator<int> rv(n,
                      []() { return pdf(prng); }
                     );
    return rv;
}
Generator<int> end_random_integers() {
    return Generator<int>();
}

Example use

#include <vector>
#include <algorithm>
#include <iostream>
int main()
{
    using namespace std;
    vector<int> out;

    cout << "copy 5 random ints into a vector\n";
    copy(begin_random_integers(5), end_random_integers(),
         back_inserter(out));
    copy(out.begin(), out.end(),
         ostream_iterator<int>(cout, ", "));

    cout << "\n" "print 2 random ints straight from generator\n";
    copy(begin_random_integers(2), end_random_integers(),
         ostream_iterator<int>(cout, ", "));

    cout << "\n" "reuse vector storage for 3 new ints\n";
    out.clear();
    copy(begin_random_integers(3), end_random_integers(),
         back_inserter(out));
    copy(out.begin(), out.end(),
         ostream_iterator<int>(cout, ", "));
}

return vector<int>, it will not be copied, it will be moved.

In C++11 the right answer is to return the std::vector<int> is to return it, ensuring that it will be either explicitly or implicitly moved. (Prefer implicit move, because explicit move can block some optimizations)

Amusingly, if you are concerned about reusing the buffer, the easiest way is to throw in an optional parameter that takes a std::vector<int> by value like this:

std::vector<int> get_stuff( int how_many, std::vector<int> retval = std::vector<int>() ) {
  // blah blah
  return retval;
}

and, if you have a preallocated buffer of the right size, just std::move it into the get_stuff function and it will be used. If you don't have a preallocated buffer of the right size, don't pass a std::vector in.

Live example: http://ideone.com/quqnMQ

I'm uncertain if this will block NRVO/RVO, but there isn't a fundamental reason why it should, and moving a std::vector is cheap enough that you probably won't care if it does block NRVO/RVO anyhow.

However, you might not actually want to return a std::vector<int> - possibly you just want to iterate over the elements in question.

In that case, there is an easy way and a hard way.

The easy way is to expose a for_each_element( Lambda ) method:

#include <iostream>
struct Foo {
  int get_element(int i) const { return i*2+1; }
  template<typename Lambda>
  void for_each_element( int up_to, Lambda&& f ) {
    for (int i = 0; i < up_to; ++i ) {
      f( get_element(i) );
    }
  }
};
int main() {
  Foo foo;
  foo.for_each_element( 7, [&](int e){
    std::cout << e << "\n";
  });
}

and possibly use a std::function if you must hide the implementation of the for_each.

The hard way would be to return a generator or a pair of iterators that generate the elements in question.

Both of these avoid the pointless allocation of the buffer when you only want to deal with the elements one at a time, and if generating the values in question is expensive (it might require traversing memory

In C++98 I would take a vector& and clear() it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top