When implementing operator[] how should I include bounds checking?

https://stackoverflow.com/questions/1026042

06-07-2019
|

Question

First of all I apologize for the long lead up to such a simplistic question.

I am implementing a class which serves as a very long 1 dimensional index on a space filling curve or the n-tuple representing the Cartesian coordinate that index corresponds to.

class curvePoint
{
public:
    friend class curveCalculate;

    //Construction and Destruction
    curvePoint(): point(NULL), dimensions(0) {}
    virtual ~curvePoint(){if(point!=NULL) delete[] point;}

    //Mutators
    void convertToIndex(){ if(isTuple()) calc(this); }
    void convertToTuple(){ if(isIndex()) calc(this); }
    void setTuple(quint16 *tuple, int size);
    void setIndex(quint16 *index, int size);
    void setAlgorithm(curveType alg){algorithm = alg;}

    //Inspectors
    bool isIndex(){return current==Index;}
    bool isTuple(){return current==Tuple;}
    size_t size(){return dimensions;}
    quint16 operator[](size_t index);

    enum curveType{HilbertCurve, ZCurve, GrayCodeCurve};
    enum status{Index, Tuple};

private:
    curveCalculate calc;
    curveType algorithm;
    quint16 *point;
    size_t dimensions;
    status current;
};

(The length of the array pointed to by point is dimensions)

Anyways in the implementation of operator[] I was wondering what the best method to achieve bounds checking is. I want to avoid throwing exceptions if at all possible, and the full range of values is usable for each number in the array so a special value to return in case of an out of bounds error is not possible either;

I was thinking of something like this though implemented in the class definition:

quint16 curvePoint::operator[](size_t index)
{
    return point[ index % dimensions ];
}

This makes it so that we never leave the bounds of the array and if well documented I think it would be fine; nevertheless, I am leary of this particular implementation.

Does this look acceptable to others? Is there any other way of doing bounds checking while still satisfying my constraints?

Edit: Calculation of things like Hilbert curves etc are highly messy, messy enough that I do not not want the additional interface for the stl libraries in the way.

Additionally because I will have to convert many thousands of these every time the multidimensional database is queried I do not want the additional cost of the stl function calls in the mix, if at all possible.

I rather like the idea of the assert; but, if I remember correctly that breaks in release builds does it not?

I suppose I can use exceptions, that seems to be what everyone is rooting for, but I am using the Qt libraries and those avoid exceptions for both performance and portability and I was hoping to do the same.

Solution

The easiest solution is to do as C++ itself does. This limits the amount of surprises that your users will experience.

C++ itself is fairly consistent. Both the built-in [] on pointers and std::vector::operator[] have undefined behavior if you use an out-of-bound array index. If you want bounds checking, be explicit and use std::vector::at

Hence, if you do the same for your class, you can document the out-of-bound behavior as "standard".

OTHER TIPS

Anyways in the implementation of operator[] I was wondering what the best method to achieve bounds checking is. I want to avoid throwing exceptions if at all possible, and the full range of values is usable for each number in the array so a special value to return in case of an out of bounds error is not possible either;

Then the remaining options are:

Flexible design. What you did. "Fix" the invalid input so that it tries to do something which makes sense. Advantage: Function won't crash. Disadvantage: Clueless callers who access an out of bounds element will get a lie as a result. Imagine a 10-floor building with floors 1 to 10:

You: "Who lives in the 3rd floor?"

Me: "Mary".

You: "Who lives in the 9th floor?"

Me: "Joe".

You: "Who lives in the 1,203rd floor?"

Me: (Wait... 1,203 % 10 = 3...) > "Mary".

You: "Wow, Mary must enjoy great views from up there. So she owns two apartments then?"

A bool output parameter indicates success or failure. This option usually ends up in not very usable code. Many users will ignore the return code. You are still left with what you return in the other return value.
Design by Contract. Assert that the caller is within bounds. (For a practical approach in C++, see An exception or a bug? by Miro Samek or Simple Support for Design by Contract in C++ by Pedro Guerreiro.)
Return a System.Nullable<quint16>. Oops, wait, this is not C#. Well, you could return a pointer to a quint16. This of course has lots of implications which I shall not discuss here and which probably make this option not usable.

My favorite choices are:

For the public interface of a publicly released library: Input will be checked and an exception will be thrown. You ruled out this option, so it is not an option for you. It is still my choice for the interface of a publicly released library.
For internal code: Design by contract.

For me, this solution is unacceptable because you can be hiding a very hard to find bug. Throwing an out of range exception is the way to go, or at least put an assertion in the function.

If what you need is some sort of "circular" array of points, then your solution is ok. However, to me, it looks just like hiding missusage of indexing operator behind some "safe" logic, so I would be against your proposed solution.

If don't want to allow index overflow, then you could check and throw an exception.

quint16 curvePoint::operator[](size_t index)
{
    if( index >= dimensions)
    {
       throw std::overflow_error();
    }
    return point[ index ];
}

If you want to have less overhead, you could avoid exception, by using debug time assertions (assume that the provided index is always valid):

quint16 curvePoint::operator[](size_t index)
{
    assert( index < dimensions);
    return point[ index ];
}

However, I suggest that, instead of using point and dimension members, use a std::vector< quint16> for point storage. It already has an index based access that you could use:

quint16 curvePoint::operator[](size_t index)
{
    // points is declared as std::vector< quint16> points;
    return points[ index ];
}

Having an operator[] that never fails sounds nice, but may hide bugs later on, if a calling function uses an illegal offset, finds a value from the beginning of the buffer, and proceeds as if that is a valid value.

The best method to achieve bounds checking would be to add an assert.

quint16 curvePoint::operator[](size_t index)
{
    assert(index < dimensions);
    return point[index];
}

If your code already depends on Boost libraries, you might want to use BOOST_ASSERT instead.

If I were you I would follow the example set by the stl.

In this case std::vector supplies two methods: at which is bounds checked and operator[] which is not. This allows the client to decide with version to use. I would definitely not use the % size(), as this just hides the bug. However bounds checking will add a lot of overhead for when iterating over a large collection, this is why it should be optional. Although I agree with other posters that the assert is a very good idea as this will only cause a performance hit in debug builds.

You should also consider returning references and supplying const and not const versions. Here are the function declarations for std::vector:

reference at(size_type _Pos);
const_reference at(size_type _Pos) const;

reference operator[](size_type _Pos);
const_reference operator[](size_type _Pos) const;

As a good rule of thumb if I am not sure how to specify an API I look for examples of how others specify similar APIs. Also when ever I use an API I try to judge or rate it, find the bits I like and dislike.

Thanks to the comment on the C# feature in the post of Daniel Daranas I have managed to figure out a possible solution. As I stated in my question I am using the Qt libraries. There for I can use QVariant. QVariant can be set to an invalid state that can be checked by the function receiving it. So the code would become something like:

QVariant curvePoint::operator[](size_t index){
    QVariant temp;
    if(index > dimensions){
        temp = QVariant(QVariant::Invalid);
    }
    else{
        temp = QVariant(point[index]);
    }

    return temp;
}

Of course this has the potential of inserting a bit of gnarly overhead into the function so another possibility is to use a pair template.

std::pair<quint16, bool> curvePoint::operator[](size_t index){
    std::pair<quint16, bool> temp;
    if(index > dimensions){
        temp.second = false;
    }
    else{
        temp.second = true;
        temp.first = point[index];
    }
    return temp;
}

Or I could use a QPair, which has exactly the same functionality and would make it so that the STL need not be linked in.

You could perhaps add an "out of bounds" exception to the [] operator (or at least an assert).

This should catch any problems, especially when debugging.

Unless I'm drastically misunderstanding something,

return point[ index % dimensions ];

is not bounds checking at all. It's returning a real value from a totally different part of the line, which will make it much harder to detect bugs.

I would either:

Throw an exception or an assertion (though you said you don't want to do so)
Simply dereference point past the array in a "natural" way (i.e. just skip any internal checking). The advantage over the % is that they're more likely (though undefined is undefined) to get "weird" values and/or an access violation

In the end, the caller is violating your pre-conditions, and you can do whatever you please. But I think these are the most reasonable options.

Also consider what Cătălin said about incorporating built-in STL collections if that's reasonable.

Your solution would be nice if you were providing access to the points of an elliptic shape. But it will lead to very nasty bugs if you use it for arbitrary geometric functions, because you knowingly provide a false value.

The modulo operator works surprisingly well for array indices -- it also implements negative indices (ie. point[-3] = point[dimensions - 3]). This is easy to work with, so I'd personally recommend the modulo operator as long as it's well-documented.

Another option is to let the caller choose the out-of-bounds policy. Consider:

template <class OutOfBoundsPolicy>
quint16 curvePoint::operator[](size_t index)
{
    index = OutOfBoundsPolicy(index, dimensions);
    return point[index];
}

Then you could define several policies that the caller may choose. For example:

struct NoBoundsCheck {
    size_t operator()(size_t index, size_t /* max */) {
        return index;
    }
};

struct WrapAroundIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        return index % max;
    }
};

struct AssertIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        assert(index < max);
        return index % max;
    }
};

struct ThrowIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        if (index >= max) throw std::domain_error;
        return index;
    }
};

struct ClampIfOutOfBounds {
    size_t operator()(size_t index, size_t max) {
        if (index >= max) index = max - 1;
        return index;
    }
};

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow