Question

#include <iostream>
#include <string>

int main() {
    std::string str = "hello " "world" "!";
    std::cout << str;
}

The following compiles, runs, and prints:

hello world!

see live


It seems as though the string literals are being concatenated together, but interestingly this can not be done with operator +:

#include <iostream>
#include <string>

int main() {
    std::string str = "hello " + "world";
    std::cout << str;
}

This will fail to compile.
see live


Why is this behavior in the language? My theory is that it is allows strings to be constructed with multiple #include statements because #include statements are required to be on their own line. Is this behavior simply possible due to the grammar of the language, or is it an exception that was added to help solve a problem?

Was it helpful?

Solution

Adjacent string literals are concatenated we can see this in the draft C++ standard section 2.2 Phases of translation paragraph 6 which says:

Adjacent string literal tokens are concatenated

In your other case, there is no operator+ defined to take two *const char**.

As to why, this comes from C and we can go to the Rationale for International Standard—Programming Languages—C and it says in section 6.4.5 String literals:

A string can be continued across multiple lines by using the backslash–newline line continuation, but this requires that the continuation of the string start in the first position of the next line. To permit more flexible layout, and to solve some preprocessing problems (see §6.10.3), the C89 Committee introduced string literal concatenation. Two string literals in a row are pasted together, with no null character in the middle, to make one combined string literal. This addition to the C language allows a programmer to extend a string literal beyond the end of a physical line without having to use the backslash–newline mechanism and thereby destroying the indentation scheme of the program. An explicit concatenation operator was not introduced because the concatenation is a lexical construct rather than a run-time operation.

without this feature you would have to do this to continue a string literal over multiple lines:

   std::string str = "hello \
world\
!";

which is pretty ugly.

OTHER TIPS

Like @erenon said, the compiler will merge multiple string literals into one, which is especially helpful if you want to use multiple lines like so:

cout << "This is a very long string-literal, "
        "which for readability in the code "
        "is divided over multiple lines.";

However, when you try to concatenate string-literals together using operator+, the compiler will complain because there is no operator+ defined for two char const *'s. The operator is defined for the string class (which is totally different from C-strings), so it is legal to do this:

string str = string("Hello ") + "world";

The compiler concatenates the string literals automatically into a single one.

When the compiler sees "hello " + "world"; is looking for a global + operator which takes two const char* ... And since by default there is none it fails.

The "hello " "world" "!" is resolved by the compiler as a single string. This allows you to have concatenated strings written over multiple lines .

In the first example, the consecutive string literals are concatenated by magic, before compilation has properly started. The compiler sees a single literal, as if you'd written "hello world!".

In the second example, once compilation has begun, the literals become static arrays. You can't apply + to two arrays.

Why is this behavior in the language?

This is a legacy of C, which comes from a time when memory was a precious resource. It allows you to do quite a lot of string manipulation without requiring dynamic memory allocation (as more modern idioms like std::string often do); the price for that is some rather quirky semantics.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top