Question

I see the follow pattern occurring quite frequently:

 b->last = ngx_cpymem(b->last, "</pre><hr>", sizeof("</pre><hr>") - 1);

Notice that the literal string is used twice. The extract is from the nginx source-base.

The compiler should be able to merge these literals when it is encountered within the compilation unit.

My questions are:

  1. Do the commercial-grade compilers(VC++, GCC, LLVM/Clang) remove this redundancy when encountered within a compilation unit ?
  2. Does the (static) linker remove such redundancies when linking object files.
  3. if 2 applies would this optimization occur during dynamic linking ?
  4. If 1 and 2 apply, do they apply to all literals.

These questions are important because it allows a programmer to be verbose without losing efficiency -- i.e., think about enormous static data models being hard-wired into a program (for example the rules of a Decision Support System used in some low-level scenario).

Edit

2 points / clarifications

  1. The code above is written by a recognised "master" programmer. The guy single handedly wrote nginx.

  2. I have not asked which of the possible mechanisms of literal hard-coding is better. Therefore don't go off-topic.

Edit 2

My original example was quite contrived and restrictive. The following snippet shows the usage of string literals being embedded into internal hard-coded knowledge. The first snippet is meant for the config parser telling it what enum values to set for which string, and the second to be used more generally as a string in the program. Personally I am happy with this as long as the compiler uses one copy of the string literal, and since the elements are static, they don't enter the global symbol tables.

static ngx_conf_bitmask_t  ngx_http_gzip_proxied_mask[] = {
   { ngx_string("off"), NGX_HTTP_GZIP_PROXIED_OFF },
   { ngx_string("expired"), NGX_HTTP_GZIP_PROXIED_EXPIRED },
   { ngx_string("no-cache"), NGX_HTTP_GZIP_PROXIED_NO_CACHE },
   { ngx_string("no-store"), NGX_HTTP_GZIP_PROXIED_NO_STORE },
   { ngx_string("private"), NGX_HTTP_GZIP_PROXIED_PRIVATE },
   { ngx_string("no_last_modified"), NGX_HTTP_GZIP_PROXIED_NO_LM },
   { ngx_string("no_etag"), NGX_HTTP_GZIP_PROXIED_NO_ETAG },
   { ngx_string("auth"), NGX_HTTP_GZIP_PROXIED_AUTH },
   { ngx_string("any"), NGX_HTTP_GZIP_PROXIED_ANY },
   { ngx_null_string, 0 }
};

followed closely by:

static ngx_str_t  ngx_http_gzip_no_cache = ngx_string("no-cache");
static ngx_str_t  ngx_http_gzip_no_store = ngx_string("no-store");
static ngx_str_t  ngx_http_gzip_private = ngx_string("private");

To those that stayed on topic, bravo !

Was it helpful?

Solution

Note that for the specific case of sizeof("</pre><hr>"), it is virtually certain that the string literal will never appear in the output file - the entire sizeof expression can be evaluated to the integer constant 11 at compile-time.

Notwithstanding, it is still a very common optimisation for compilers to merge identical string literals.

OTHER TIPS

I can't answer your questions but always try to use a const string (or even a #define would be better) in such circumstances. The problem comes when you are refactoring code and change the value of one literal while forgetting the other (not so likely in your example as they are right next to each other but I have seen it before).

Whatever optomisations the compiler can do humans can still bugger it up :)

  1. Yes for GCC, should be also true for others
  2. Maybe yes for GNU linker (see -fmerge-constants, -fmerge-all-constants)
  3. No
  4. Not sure

I would be very unhappy to see that pattern - what if someone changes one literal without changing the other? It should be pulled out; make a pretty little named constant.

Assuming you can't for some reason, or just to actually answer the question: (At least, anecdotally.)

I made a similar program in C and compiled it with GCC 4.4.3, the constant string appeared only once in the resulting executable.

Edit: Since it might be useful as an easy test, here is the code I tested it with...

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

main(){
    char *n = (char*)malloc(sizeof("teststring"));
    memcpy((void*)n, "teststring", sizeof("teststring"));
    printf("%s\n", n);
}

And here is the command I used to check how many times the string appeared...

strings a.out|grep teststring

But please please consider using less error-prone coding practices where possible.

I wrote a small sample code and compiled:

void func (void)
{
    char ps1[128];
    char ps2[128];

    strcpy(ps1, "string_is_the_same");
    strcpy(ps2, "string_is_the_same");

    printf("", ps1, ps2);
}

As a result in assembler file there is only one instance of literal "string_is_the_same" even without optimization. However, not sure if these strings are not duplicated being placed into different files -> different object files.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top