Pregunta

I want to use range-based for to iterate over the unicode code points in a UTF8 encoded std::string. I have defined my own begin and end in the global namespace but the begin and end in the std namespace are being preferred (i.e. those found by ADL). Is there any way to prefer my own functions?

Example:

const char* begin(const std::string& s) {
    std::cout << "BEGIN";
    return s.data();
}

const char* end(const std::string& s) {
    std::cout << "END";
    return s.data() + s.length();
}

int main() {
    std::string s = "asdf";

    for (char c : s)
        std::cout << c;
}

I want it to print BEGINENDasdf (or ENDBEGINasdf) but it prints asdf.

Is there no other way than to do a manual for using a qualified name?

¿Fue útil?

Solución

Wrap std::string in your own type. By making it a template you can customise any existing container and add your own range logic to it. It's not even that different from your first attempt.

#include <string>
#include <iostream>

template <typename S>
struct custom_container {
    S &s_;

    custom_container (S &s) : s_(s) {}

    auto begin() -> decltype(s_.begin()) {
        std::cout << "BEGIN";
        return s_.begin();
    }

    auto end() -> decltype(s_.end()) {
        std::cout << "END";
        return s_.end();
    }
};

template <typename S>
custom_container make_container (S &s) {
     return custom_container <S> (s);
}


int main () {
    std::string t = "asdf";
    auto s = make_container(t);

    for (char c : s) {
        std::cout << c;
    }
}

Outputs

BEGINENDasdf

Otros consejos

N3337 6.5.4/1:

(...) begin-expr and end-expr are determined as follows:

— if _RangeT is an array type, begin-expr and end-expr are __range and __range + __bound, respectively, (...);

— if _RangeT is a class type, the unqualified-ids begin and end are looked up in the scope of class _RangeT as if by class member access lookup (3.4.5), and if either (or both) finds at least one declaration, begin-expr and end-expr are __range.begin() and __range.end(), respectively;

— otherwise, begin-expr and end-expr are begin(__range) and end(__range), respectively, where begin and end are looked up with argument-dependent lookup (3.4.2). For the purposes of this name lookup, namespace std is an associated namespace.

So in other words, it will call std::string's begin and end member functions (second list bullet). The correct solution is to provide a wrapper class as anthony's answer suggests.

Note: If you use -std=c++1y you can omit the trailing decltype.

You can also write a typedef to make it less typing:

typedef custom_string<std::string> cs;

for (char c : cs(t)) {
    std::cout << c;
}

The cleanest way to do, at least at the point of use, this is to mark up your type for the purpose of special iteration.

First, some machinery:

template<class Mark, class T>
struct marked_type {
  T raw;
  marked_type(T&& in):raw(std::forward<T>(in)) {}
};
template<typename Mark, typename T>
marked_type<Mark, T> mark_type( T&& t ) {
  return {std::forward<T>(t)};
}

next, we invent a mark that says "iterate strangely", and overload begin/end:

struct strange_iteration {};
template<typename T>
auto begin( marked_type<strange_iteration, T> const& container )
  -> decltype( std::begin(std::forward<T>(container.raw)) )
{
  std::cout << "BEGIN";
  using std::begin;
  return begin(std::forward<T>(container.raw));
}
template<typename T>
auto end( marked_type<strange_iteration, T> const& container )
  -> decltype( std::end(std::forward<T>(container.raw)) )
{
  std::cout << "END";
  using std::end;
  return end(std::forward<T>(container.raw));
}        

and then at point of use:

std::string s = "hello world";
for( char c : mark_type<strange_iteration>(s) ) {
  std::cout << c;
}
std::cout << "\n";

with the one note that I wrote mark_type to be overly generic.

Now, mark_type<Foo> will create references to lvalues, and create a moved-to copy of an rvalue, if passed to it. In an iteration, its return value's lifetime will be extended by reference lifetime extension.

You can use this technique to do things like

for( char c : mark_type<reverse_iteration>(s) )

where now we instead iterate backwards, regardless of the container we have passed in. The "creation of a copy" for rvalue is needed for constructs like this:

for( char c: mark_type<reverse_iteration>(mark_type<strange_iteration>(s))

where we daisy-chain the marks. Lifetime extension only applies to the outermost return value, and our "create a copy and move" on rvalue is basically manual lifetime extension.

Finally, the std::begin use in the above code is better done in an ADL-admitting context in the return values. Create a helper namespace like this:

namespace adl_helper {
  using std::begin; using std::end;
  template<typename T>
  auto adl_begin(T&& t)->decltype( begin(std::forward<T>(t)) ); // no implementation
  template<typename T>
  auto adl_end(T&& t)->decltype( end(std::forward<T>(t)) ); // no implementation
  // add adl_cbegin, adl_rbegin etc in C++14
}

then replace std::begin in the decltypes in my above code with adl_helper::adl_begin, which emulates how for( a:b ) loops find begin and end a touch better (not perfectly, but better).

C++1y may come with some machinery to remove the need for the above hack.

Sample code running: http://ideone.com/RYvzD0

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top