Question

I'm doing some maintenance work and ran across something like the following:

std::string s;
s.resize( strLength );  
// strLength is a size_t with the length of a C string in it. 

memcpy( &s[0], str, strLength );

I know using &s[0] would be safe if it was a std::vector, but is this a safe use of std::string?

Was it helpful?

Solution

A std::string's allocation is not guaranteed to be contiguous under the C++98/03 standard, but C++11 forces it to be. In practice, neither I nor Herb Sutter know of an implementation that does not use contiguous storage.

Notice that the &s[0] thing is always guaranteed to work by the C++11 standard, even in the 0-length string case. It would not be guaranteed if you did str.begin() or &*str.begin(), but for &s[0] the standard defines operator[] as:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified

Continuing on, data() is defined as:

Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].

(notice the square brackets at both ends of the range)


Notice: pre-standardization C++0x did not guarantee &s[0] to work with zero-length strings (actually, it was explicitly undefined behavior), and an older revision of this answer explained this; this has been fixed in later standard drafts, so the answer has been updated accordingly.

OTHER TIPS

Technically, no, since std::string is not required to store its contents contiguously in memory.

However, in almost all implementations (every implementation of which I am aware), the contents are stored contiguously and this would "work."

It is safe to use. I think most answers were correct once, but the standard changed. Quoting from C++11 standard, basic_string general requirements [string.require], 21.4.1.5, says:

The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

A bit before that, it says that all iterators are random access iterators. Both bits support the usage of your question. (Additionally, Stroustrup apparently uses it in his newest book ;) )

It's not unlikely that this change was done in C++11. I seem to remember that the same guarantee was added then for vector, which also got the very useful data() pointer with that release.

Hope that helps.

Readers should note that this question was asked in 2009, when the C++03 Standard was the current publication. This answer is based on that version of the Standard, in which std::strings are not guaranteed to utilize contiguous storage. Since this question was not asked in the context of a particular platform (like gcc), I make no assumptions about OP's platform -- in particular, weather or not it utilized contigious storage for the string.

Legal? Maybe, maybe not. Safe? Probably, but maybe not. Good code? Well, let's not go there...

Why not just do:

std::string s = str;

...or:

std::string s(str);

...or:

std::string s;
std::copy( &str[0], &str[strLen], std::back_inserter(s));

...or:

std::string s;
s.assign( str, strLen );

?

The code might work, but more by luck than judgement, it makes assumptions about the implementation that are not guaranteed. I suggest determining the validity of the code is irrelevant while it is a pointless over complication that is easily reduced to just:

std::string s( str ) ;

or if assigning to an existing std::string object, just:

s = str ;

and then let std::string itself determine how to achieve the result. If you are going to resort to this sort of nonsense, then you may as well not be using std::string and stick to since you are reintroducing all the dangers associated with C strings.

This is generally not safe, regardless of whether the internal string sequence is stored in memory continuously or not. There's might be many other implementation details related to how the controlled sequence is stored by std::string object, besides the continuity.

A real practical problem with that might be the following. The controlled sequence of std::string is not required to be stored as a zero-terminated string. However, in practice, many (most?) implementations choose to oversize the internal buffer by 1 and store the sequence as a zero-terminated string anyway because it simplifies the implementation of c_str() method: just return a pointer to the internal buffer and you are done.

The code you quoted in your question does not make any effort to zero-terminate the data is copied into the internal buffer. Quite possibly it simply doesn't know whether zero-termination is necessary for this implementation of std::string. Quite possibly it relies on the internal buffer being filled with zeros after the call to resize, so the extra character allocated for the zero-terminator by the implementation is conveniently pre-set to zero. All this is an implementation detail, meaning that this technique depends on some rather fragile assumptions.

In other words, in some implementations, you'd probably have to use strcpy, not memcpy to force the data into the controlled sequence like that. While in some other implementations you'd have to use memcpy and not strcpy.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top