How does the std::basic_string constructor know beforehand how much space to reserve?

StackOverflow https://stackoverflow.com/questions/19778226

  •  03-07-2022
  •  | 
  •  

std::basic_string has the following constructor which initializes the string with the contents of the null-terminated string pointed to by s:

std::basic_string(const CharT* s, const Allocator& alloc = Allocator());

But how does the constructor know beforehand how much space to reserve for the string in its internal buffer?

I could think of two methods:

1) It could first go through the whole null-terminated string until it finds the first NULL character, remember how many characters it traversed, and use that as the capacity for its internal buffer and start copying.

Disadvantage: It has to read the string twice, once for counting the characters, a second time for copying the string.

2) It could reserve a conservative amount in its internal buffer and just start copying. If it hits the NULL character before the buffer runs out, we're OK, otherwise we need to reserve more space (again by a conservative amount), and repeat the steps.

Disadvantage: If the string is fairly large, the overhead of constantly readjusting the capacity might become noticeable.

So, what does a sane std::basic_string implementation do (or is this even specified in the standard)?

有帮助吗?

解决方案 2

The first approach is the answer. Per standard §21.4.2:

basic_string(const charT* s, const Allocator& a = Allocator());

9 Effects: Constructs an object of class basic_string and determines its initial string value from the array of charT of length traits::length(s) whose first element is designated by s...

and

10 Remarks: Uses traits::length().

gcc's implementation is:

  template<typename _CharT, typename _Traits, typename _Alloc>
    basic_string<_CharT, _Traits, _Alloc>::
    basic_string(const _CharT* __s, const _Alloc& __a)
    : _M_dataplus(_S_construct(__s, __s ? __s + traits_type::length(__s) :
                   __s + npos, __a), __a)
    { }

It uses traits_type::length which is something like std::char_traits::length to discover length of c-style zero terminated strings.


If you have huge entry string to pass the function and you have it's length, you can use another overload which gets the size and doesn't calculate it again:

basic_string(const CharT* s, size_type count, ...)

The second approach that you've mentioned has another disadvantage, it has to shrink the allocate memory to stop wasting memory. This operation is expensive also.

其他提示

Common implementations will walk the original string to calculate the length and then allocate that much space. It requires walking the string twice, but that is a fast operation, in some cases with hardware support and even when there is no hardware support for the operation, it is probably cheap compare with a single memory allocation.

I can't think of a sane implementation that would use the second method. Some implementations (i.e. Visual C++) do perform default initialization, which may allocate some minimum length (such as 1 or 16), and then call assign, which will get the length of the string, reallocate if necessary, and then copy the string.

Many - if not all - modern compilers will use hand-tuned assembly language to get the length of a null-terminated string, which is typically extremely fast. Doing an allocate-copy-reallocate-copy-etc... would be madness, really, at least on all platforms that I know of.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top