Are std::fill, std::copy specialized for std::vector<bool>?

https://stackoverflow.com/questions/12433154

02-07-2021
|

Question

When thinking about this question I start to wondering if std::copy() and/or std::fill are specialized (I really mean optimized) for std::vector<bool>.

Is this required by C++ standard or, perhaps, it is common approach by C++ std library vendors?

Simple speaking, I wonder to know if the following code:

std::vector<bool> v(10, false);
std::fill(v.begin(), v.end(), true);

is in any way better/different than that:

std::vector<bool> v(10, false);
for (auto it = v.begin(); it != v.end(); ++it) *it = true;

To be very strict - can, let say: std::fill<std::vector<bool>::iterator>() go into internal representation of std::vector<bool> and sets their entire bytes instead of single bits? I assume making std::fill friend of std::vector<bool> is not a big problem for library vendor?

[UPDATE]

Next related question: can I (or anybody else :) specialize such algorithms for let say std::vector<bool>, if not already specialized? Is this allowed by C++ standard? I know this will be non portable - but just for one selected std C++ library? Assuming I (or anybody else) find a way to get to std::vector<bool> private parts.

Solution

STD is headers only library and it is shipped with your compiler. You can look into those headers yourself. For GCC's vector<bool> impelemtation is in stl_bvector.h. It probably will be the same file for other compilers too. And yes, there is specialized fill (look near __fill_bvector).

OTHER TIPS

Optimizations are nowhere mandated in the standard. It is assumed to be a "quality of implementation" issue if an optimization could applied. The asymptotic complexity of most algorithms is, however, restricted.

Optimizations are allowed as long as a correct program behaves according to what the standard mandates. The examples you ask about, i.e., optimizations involving standard algorithms using iterators on std::vector<bool>, can achieve their objective pretty much in any way the implementation sees fit because there is no way to monitor how they are implemented. This said, I doubt very much that there is any standard library implementation optimizing operations on std::vector<bool>. Most people seem to think that this specialization is an abomination in the first place and that it should go away.

A user is only allowed to create specializations of library types if the specialization involves at least one user defined type. I don't think a user is allowed to provide any function in namespace std at all: There isn't any needs because all such functions would involve a user defined type and would, thus, be found in the user's namespace. Formulated differently: I think you are out of luck with respect to getting algoritms optimized for std::vector<bool> for the time being. You might consider contributing optimized versions to the open source implementations (e.g., libstdc++ and libc++), however.

There is no specialization for it, but you can still use it. (even though it's slow)

But here is a trick I found which enables std::fill on std::vector<bool>, using proxy class std::_Vbase.

(WARNING: I've tested it only for MSVC2013, so it may not work on other compilers.)

int num_bits = 100000;
std::vector<bool> bit_set(num_bits , true);

int bitsize_elem = sizeof(std::_Vbase) * 8; // 1byte = 8bits
    
int num_elems = static_cast<int>(std::ceil(num_bits / static_cast<double>(bitsize_elem)));

Here, since you need the whole bits of an element if you use any bit of it, the number of elements must be rounded up.

Using this information, we will build a vector of pointers that pointing the original elements underlying the bits.

std::vector<std::_Vbase*> elem_ptrs(num_elems, nullptr);

std::vector<bool>::iterator bitset_iter = bit_set.begin();
for (int i = 0; i < num_elems; ++i)
{
    std::_Vbase* elem_ptr = const_cast<std::_Vbase*>((*bitset_iter)._Myptr);
    elem_ptrs[i] = elem_ptr;
    std::advance(bitset_iter, bitsize_elem);
}

(*bitset_iter)._Myptr : By dereferencing the iterator of std::vector<bool>, you can access the proxy class reference and its member _Myptr.

Since the return type of std::vector<bool>::iterator::operator*() is const std::_Vbase*, remove the constness of it by const_cast.

Now we get the pointer which is pointing the original element underlying those bits, std::_Vbase* elem_ptr.

elem_ptrs[i] = elem_ptr : Record this pointer,...

std::advance(bitset_iter, bitsize_elem) : ...and continue our journey to find the next element, by jumping bits held by the previous element.

std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0); // fill every bits "false"
std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, -1); // fill every bits "true"

Now, we can use std::fill on the vector of pointers, rather than vector of bits.

Perhaps some may feel uncomfortable using the proxy class externally and even remove the constness of it.

But if you don't care about that and want something fast, this is the fastest way.

I did some comparisons below. (made new project, nothing changed config, release, x64)

int it_max = 10; // do it 10 times ...
int num_bits = std::numeric_limits<int>::max(); // 2147483647

std::vector<bool> bit_set(num_bits, true);
for (int it_count = 0; it_count < it_max; ++it_count)
{
    std::fill(elem_ptrs[0], elem_ptrs[0] + num_elems, 0);
} // Elapse Time : 0.397sec

for (int it_count = 0; it_count < it_max; ++it_count)
{
    std::fill(bit_set.begin(), bit_set.end(), false);
} // Elapse Time : 18.734sec

for (int it_count = 0; it_count < it_max; ++it_count)
{
    for (int i = 0; i < num_bits; ++i)
    {
        bit_set[i] = false;
    }
} // Elapse Time : 21.498sec

for (int it_count = 0; it_count < it_max; ++it_count)
{
    bit_set.assign(num_bits, false);
} // Elapse Time : 21.779sec

for (int it_count = 0; it_count < it_max; ++it_count)
{
    bit_set.swap(std::vector<bool>(num_bits, false)); // You can not use elem_ptrs anymore
} // Elapse Time : 1.3sec

There is one caveat. When you swap() the original vector with another one, then the vector of pointers becomes useless!

23.2.5 Class vector from the C++ International Standard goes as far as to tell us

To optimize space allocation, a specialization of vector for bool elements is provided:

after which the bitset specialization is provided. That's as far as the standard goes regarding vector<bool>, vendors need to implement it using a bitset to optimize for space. Optimizing for space comes with a cost here, as to not optimize for speed.

It's easier to get a book from the library than it is to find a book if it were between all the library books stapled closely together in containers....

Take your example, you're trying to do a std::fill or std::copy from begin to end. But that's not always the case, sometimes it doen't just simply map to an entire byte. So, that's a bit of a problem in terms of speed optimization. It's easy for the case you'd have to change every bit to one, that's just changing the bytes to 0xF, but that's not the case here; it becomes much harder if you were to only changes certain bits of a byte. Then you'll need to actually compute what the byte will be; that's not a trivial thing to do^*, or at least not as an atomic operation on current hardware.

It's the premature optimization story, it's nice in terms of space but horrible in terms of performance.

Is having a "is a multiple of 8 bits" check worth the overhead? I doubt it.

_{* We're talking about multiple bits here, for the case it's just one bit you can of course do a bit operation.}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow