문제

Can I 'ignore' query string variables before pulling matching objects from the cache, but not actually remove them from the URL to the end-user?

For example, all the marketing utm_source, utm_campaign, utm_* values don't change the content of the page, they just vary a lot from campaign to campaign and are used by all of our client-side tracking.

So this also means that the URL can't change on the client side, but it should somehow be 'normalized' in the cache.

Essentially I want all of these...

http://example.com/page/?utm_source=google

http://example.com/page/?utm_source=facebook&utm_content=123

http://example.com/page/?utm_campaign=usa

... to all access HIT the cache for http://example.com/page/

However, this URL would cause a MISS (because the param is not a utm_* param)

http://example.com/page/?utm_source=google&variation=5

Would trigger the cache for

http://example.com/page/?variation=5

Also, keeping in mind that the URL the user sees must remain the same, I can't redirect to something without params or any kind of solution like that.

도움이 되었습니까?

해결책 2

This did the trick... it's not perfect according to my own question though as it ignores ALL query params, not just utm ones. When I need to actually implement a non-utm value which changes the content I will need to revisit this regex:

sub vcl_recv {
    set req.url = regsub(req.url, "\?.*", "");
}

다른 팁

So I'll add a disclaimer that this regex probably isn't perfect, but it should work pretty well:

sub vcl_recv {  
  set req.url = regsuball(req.url, "\?(utm_[^=&]*=[^&=]*&?)+", "?");
  set req.url = regsuball(req.url, "&(utm_[^=&]*=[^&=]*(&|$))+", "\2");
  set req.url = regsub(req.url, "\?$", "");

  return (pass);
}

This should remove any query parameters starting with utm_. I used three regexs to make it clearer and easier to read.

The first regsuball removes any utm_ parameters at the beginning of the query string. It looks for one or more utm_ parameters immediately after the ?. The second regsuball removes any utm_ parameters that aren't at the beginning of the query string.

The third regex will cleanup the URL by removing the ? if there are no query parameters left after we are done removing utm_ parameters.

Both regexes need to be in ()+ as this will match one or more consecutive utm_ parameters (they wouldn't be matched otherwise).

Example results:

Source URL: /?utm_track=1&utm_test2=hey&test=utm_blah&utm_source=google&variation=5&utm_query=abc&utm_test7=yes
Maps to:    /?test=utm_blah&variation=5

Source URL: /?variation=5&utm_test1=abc&utm_test2=def&blah=1
Maps to:    /?variation=5&blah=1
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top