So I'll add a disclaimer that this regex probably isn't perfect, but it should work pretty well:
sub vcl_recv {
set req.url = regsuball(req.url, "\?(utm_[^=&]*=[^&=]*&?)+", "?");
set req.url = regsuball(req.url, "&(utm_[^=&]*=[^&=]*(&|$))+", "\2");
set req.url = regsub(req.url, "\?$", "");
return (pass);
}
This should remove any query parameters starting with utm_
. I used three regexs to make it clearer and easier to read.
The first regsuball
removes any utm_
parameters at the beginning of the query string. It looks for one or more utm_
parameters immediately after the ?
. The second regsuball
removes any utm_
parameters that aren't at the beginning of the query string.
The third regex will cleanup the URL by removing the ?
if there are no query parameters left after we are done removing utm_
parameters.
Both regexes need to be in ()+
as this will match one or more consecutive utm_
parameters (they wouldn't be matched otherwise).
Example results:
Source URL: /?utm_track=1&utm_test2=hey&test=utm_blah&utm_source=google&variation=5&utm_query=abc&utm_test7=yes
Maps to: /?test=utm_blah&variation=5
Source URL: /?variation=5&utm_test1=abc&utm_test2=def&blah=1
Maps to: /?variation=5&blah=1