Question

I have a rewrite rule which redirects to / if no Accept-Language header is present and someone attempts to visit ?lang=en. It works fine, except for the headers returned. Vary: Accept-Language is missing from the response.

RewriteCond %{HTTP:Accept-Language} ^$  
RewriteCond %{QUERY_STRING}         ^lang=en  
RewriteRule ^$                      http://www.example.com/?     [R=301,L]

The Apache documentation specifies:

If a HTTP header is used in a condition this header is added to the Vary header of the response in case the condition evaluates to to true for the request. It is not added if the condition evaluates to false for the request.

The conditions are definitely matching and redirecting, so I don't understand why Apache isn't adding the language vary. One can see why this would be a real problem if a proxy were to cache that ?lang=en and always redirect to / regardless of the Accept-Language header sent.

Was it helpful?

Solution

After peeking into the seedy underbelly of Apache's request handling system, it turns out that the documentation is somewhat misleading...But before I get into the explanation, from what I can tell you're at the mercy of Apache on this one.

The Client Problem

First, the header name will not be added to the Vary response header if it is not sent by the client. This is due to how mod_rewrite constructs the value for that header internally.

It looks up the header by name using apr_table_get(), the request's header table, and the name that you provided:

const char *val = apr_table_get(ctx->r->headers_in, name);

If name is not a key in the table, this function will return NULL. This is a problem, because immediately after this is a check against val:

if (val) {
   // Set the structure member ctx->vary_this
}

ctx->vary_this is used on a per-RewriteCond basis to accumulate header names that should be assembled into the final Vary header*. Since no assignment or appending will occur if there is no value, a referenced (but not sent) header will never appear in Vary. The documentation doesn't explicitly state this, so it may or may not have been what you expected.

*As an aside, the NV (no vary) flag and ignore-on-failure functionality is implemented by setting ctx->vary_this to NULL, preventing its addition to the response header.

However, it's possible that you sent Accept-Language, but it was blank. In this case, the empty string will pass the above check, and the header name will be added to Vary by mod_rewrite from what's described above. Keeping this in mind, I used the following request to diagnose what was going on:

User-Agent: Fiddler
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: 
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Host: 129.168.0.123

This doesn't work either, but why? mod_rewrite definitely sets the headers when the rule and condition match (ctx->vary is an aggregate of ctx->vary_this across all checked conditions):

if (ctx->vary) {
    apr_table_merge(r->headers_out, "Vary", ctx->vary);
}

This can be verified with a log statement, and r->headers_out is the variable used when generating the response headers. Given something is definitely going wrong though, there must be trouble after the rules are executed.

The .htaccess Problem

Currently, you appear to be defining your rules in .htaccess, or a <Directory> section. This means that mod_rewrite is operating in Apache's fixup phase, and the mechanism it uses to actually perform rewrites here is very messy. Let's assume for a second there's no external redirection, since you had problem a even without it (and I'll get to the issue with the redirect later).

After you perform a rewrite, it's far too late in the request processing for the module to actually map to a file. What it does instead is assign itself as the request's "content" handler and when the request reaches that point, it performs a call to ap_internal_redirect(). This leads to the creation of a new request object, one that does not contain the headers_out table from the original.

Assuming that mod_rewrite causes no further redirects, the response is generated from the new request object, which will never have the appropriate (original) headers assigned to it. It is possible to get around this by working in a per-server context (in the main configuration or in a <VirtualHost>), but...

The Redirect Problem

Unfortunately, it turns out that it's largely irrelevant anyway, since even if we do use mod_rewrite in a server context, the path the response takes in the event of a redirect still causes the headers that the module set to be tossed out.

When the request is received by Apache, through a chain of function calls it makes its way to ap_process_request(). This in turn calls ap_process_request_internal(), where the bulk of the important request parsing steps occur (including the invocation of mod_rewrite). It returns an integer status code, which in the case of your redirect happens to be set to 301.

Most requests return OK (which has a value of 0), leading immediately to ap_finalize_request_protocol(). However, that's not the case here:

if (access_status == OK) {
    ap_finalize_request_protocol(r);
}
else {
    r->status = HTTP_OK;
    ap_die(access_status, r);
}

ap_die() does some additional manipulation (like returning the response code back to 301), and in this particular case ends with a call to ap_send_error_response().

Luckily, this is finally root of the problem. Though it might seem like it, things are not "assbackwards", and this causes the destruction of the original headers. There's even a comment about it in the source:

if (!r->assbackwards) {
    apr_table_t *tmp = r->headers_out;

    /* For all HTTP/1.x responses for which we generate the message,
     * we need to avoid inheriting the "normal status" header fields
     * that may have been set by the request handler before the
     * error or redirect, except for Location on external redirects.
     */
    r->headers_out = r->err_headers_out;
    r->err_headers_out = tmp;
    apr_table_clear(r->err_headers_out);

    if (ap_is_HTTP_REDIRECT(status) || (status == HTTP_CREATED)) {
        if ((location != NULL) && *location) {
            apr_table_setn(r->headers_out, "Location", location);
        }
        //...
    }
//...
}

Take note that r->headers_out is replaced, and the original table is cleared. That table had all of the information that was expected to show up in the response, so now it is lost.

Conclusion

If you don't redirect and you define the rules in a per-server context, everything does seem to work correctly. However, this is not what you want. I can see a potential workaround, but I'm not sure if it would be acceptable, not to mention the need to recompile the server.

As for the Vary: Accept-Encoding, I can only assume it comes from a different module that behaves in a way that allows the header to sneak through. I'm also not sure why Gumbo didn't have an issue when trying it.

For reference, I was looking at the 2.2.14 and 2.2 trunk source code, and I was modifying and running Apache 2.2.15. There doesn't appear to be any significant differences between the versions in the related code sections.

OTHER TIPS

You may want to try something like the following as a workaround:

<LocationMatch "^.*lang\=">
    Header onsuccess merge Vary "Accept-Language"
</LocationMatch>

To specifically set the Vary: Accept-Language HTTP response header on the redirect response only (which is what's expected here), you would need to set an environment variable (eg. VARY_ACCEPT_LANGUAGE) as part of the redirect rule and use this to set the header conditionally with the Header directive.

You also need to use the always condition (as opposed to the default onsuccess) with the Header directive in order to set this on the 3xx response (ie. non-200 reponses).

For example:

# Redirect requests that have an empty Accept-Language header and "lang=en" is present
RewriteCond %{HTTP:Accept-Language} ^$  
RewriteCond %{QUERY_STRING} ^lang=en  
RewriteRule ^$ /? [E=VARY_ACCEPT_LANGUAGE:1,R=301,L]

# Set/Merge "Vary" header on Accept-Language redirect
Header always merge Vary "Accept-Language" env=VARY_ACCEPT_LANGUAGE

HOWEVER, the Vary header shouldn't only be set on the redirect response (when the Accept-Language header is empty), it needs to be set on all responses to requests for /?lang=en, regardless of what the Accept-Language HTTP request header is actually set to. So, relying on Apache to set this header using only the redirect would not be sufficient anyway (even if it did set the header on the response as initially expected).

In order to set the appropriate Vary header on all responses to requests for /?lang=en, including the redirect then do it like this:

# Set env var if "/?lang=en" is requested
RewriteCond %{QUERY_STRING} ^lang=en  
RewriteRule ^$ - [E=VARY_ACCEPT_LANGUAGE:1]

# Redirect requests that have an empty Accept-Language header and "lang=en" is present
RewriteCond %{HTTP:Accept-Language} ^$  
RewriteCond %{QUERY_STRING} ^lang=en  
RewriteRule ^$ /? [R=301,L]

# Set/Merge "Vary" header on all responses from "/?lang=en"
Header always merge Vary "Accept-Language" env=VARY_ACCEPT_LANGUAGE

Note, however, that if you have additional internal rewrite directives that cause the rewrite engine to start over then the env var VARY_ACCEPT_LANGUAGE is renamed to REDIRECT_VARY_ACCEPT_LANGUAGE and the above Header directive will not be successful. You'll probably need an additional directive to handle this. For example:

Header always merge Vary "Accept-Language" env=REDIRECT_VARY_ACCEPT_LANGUAGE
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top