Question

In C++, what alternatives do I have for exposing a collection, from the point of view of performance and data integrity?

My problem is that I want to return an internal list of data to the caller, but I don't want to generate a copy. Thant leaves me with either returning a reference to the list, or a pointer to the list. However, I'm not crazy about letting the caller change the data, I just want to let it read the data.

  • Do I have to choose between performance and data integrity?
  • If so, is in general better to go one way or is it particular to the case?
  • Are there other alternatives?
Was it helpful?

Solution

RichQ's answer is a reasonable technique, if you're using an array, vector, etc.

If you're using a collection that isn't indexed by ordinal values... or think you might need to at some point in the near future... then you might want to consider exposing your own iterator type(s), and associated begin()/end() methods:

class Blah
{
public:
   typedef std::vector<mydata> mydata_collection;
   typedef myDataCollection::const_iterator mydata_const_iterator;

   // ...

   mydata_const_iterator data_begin() const 
      { return myPreciousData.begin(); }
   mydata_const_iterator data_end() const 
      { return myPreciousData.end(); }

private:
   mydata_collection  myPreciousData;
};

...which you can then use in the normal fashion:

Blah blah;
for (Blah::mydata_const_iterator itr = blah.data_begin();
   itr != blah.data_end();
   ++itr)
{
   // ...
}

OTHER TIPS

Many times the caller wants access just to iterate over the collection. Take a page out of Ruby's book and make the iteration a private aspect of your class.

#include <algorithm>
#include <boost/function.hpp>

class Blah
{
  public:
     void for_each_data(const std::function<void(const mydata&)>& f) const
     {
         std::for_each(myPreciousData.begin(), myPreciousData.end(), f);
     }

  private:
     typedef std::vector<mydata> mydata_collection;
     mydata_collection  myPreciousData;
};

With this approach you're not exposing anything about your internals, i.e. that you even have a collection.

Maybe something like this?

const std::vector<mydata>& getData()
{
  return _myPrivateData;
}

The benefit here is that it's very, very simple, and as safe as you getin C++. You can cast this, like RobQ suggests, but there's nothing you can do that would prevent someone from that if you're not copying. Here, you would have to use const_cast, which is pretty easy to spot if you're looking for it.

Iterators, alternatively, might get you pretty much the same thing, but it's more complicated. The only added benefit of using iterators here (that I can think of) is that you can have better encapsulation.

Use of const reference or shared pointer will only help if the contents of underlying collection do not change over time.

Consider your design. Does the caller really need to see the internal array? Can you restructure the code so that the caller tells object what to do with the array? E.g., if the caller intends to search the array, could the owner object do it?

You could pass a reference to result vector to the function. On some compilers that may result in marginally faster code.

I would recommend trying to redesign first, going with a clean solution second, optimizing for performance third (if necessary).

One advantage of both @Shog9's and @RichQ's solutions is that they de-couple the client from the collection implementation.

If you decide th change your collection type to something else, your clients will still work.

What you want is read-only access without copying the entire blob of data. You have a couple options.

Firstly, you could just return a const refererence to whatever your data container is, like suggested above:

const std::vector<T>& getData() { return mData; }

This has the disadvantage of concreteness: you can't change how you store the data internally without changing the interface of your class.

Secondly, you can return const-ed pointers to the actual data:

const T* getDataAt(size_t index)
{
   return &mData[index];
}

This is a bit nicer, but also requires that you provide a getNumItems call, and protect against out-of-bounds indices. Also, the const-ness of your pointers is easily cast away, and your data is now read-write.

Another option is to provide a pair of iterators, which is a bit more complex. This has the same advantages of pointers, as well as not (necessarily) needing to provide a getNumItems call, and there's considerably more work involved to strip the iterators of their const-ness.

Probably the easiest way to manage this is by using a Boost Range:

typedef vector<T>::const_iterator range_iterator_type;
boost::iterator_range< range_iterator_type >& getDataRange()
{
    return boost::iterator_range(mData.begin(), mData.end());
}

This has the advantages of ranges being composable, filterable, etc, as you can see on the website.

Using const is a reasonable choice. You may also wish to check out the boost C++ library for their shared pointer implementation. It provides the advantages of pointers i.e. you may have the requirement to return a shared pointer to "null" which a reference would not allow.

http://www.boost.org/doc/libs/1_36_0/libs/smart_ptr/smart_ptr.htm

In your case you would make the shared pointer's type const to prohibit writes.

If you have a std::list of plain old data (what .NET would call 'value types'), then returning a const reference to that list will be fine (ignoring evil things like const_cast)

If you have a std::list of pointers (or boost::shared_ptr's) then that will only stop you modifying the collection, not the items in the collection. My C++ is too rusty to be able to tell you the answer to that at this point :-(

I suggest using callbacks along the lines of EnumChildWindows. You will have to find some means to prevent the user from changing your data. Maybe use a const pointer/reference.

On the other hand, you could pass a copy of each element to the callback function overwriting the copy each time. (You do not want to generate a copy of your entire collection. I am only suggesting making a copy one element at a time. That shouldn't take much time/memory).

MyClass tmp;
for(int i = 0; i < n; i++){
    tmp = elements[i];
    callback(tmp);
}

The following two articles elaborate on some of the issues involved in, and the need for, encapsulating container classes. Although they do not provide a complete worked solution, they essentially lead to the same approach as given by Shog9.

Part 1: Encapsulation and Vampires
Part 2 (free registration is now required to read this): Train Wreck Spotting
by Kevlin Henney

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top