SED or AWK to make url querystring readable
Question
I need to split a querystring to several unbounded amount of variables for debugging purposes:
The output comes from tshark and the purpose is to live debug google analytics events. The output from tshark looks like this:
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
What i want is a more human readable version:
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS/example)(10)
utmcs: UTF-8
or even better:
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
But can't get my head around sed (or awk) for this purpose...
Solution
Another way using Perl :
#!/usr/bin/perl -l
use strict; use warnings;
while (<>) {
my @arr;
my ($qs) = m/.*?GET.*?\?(\S+)\s/;
my @pairs = split(/[&~]/, $qs);
foreach my $pair (@pairs){
my ($name, $value) = split(/=/, $pair);
if ($name eq 'utme') {
$value =~ s!(%2F|%20)!/!g;
$value =~ s!\*!\n\t\t!g;
$value =~ s!\(!(\n\t\t!;
$value =~ s/\)\(/\n\t)(/;
}
# let's URI unescape stuff
$value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
if ($name eq 'utmhn') {
print "$name: $value";
}
else {
push @arr, "$name: $value";
}
}
print join "\n", @arr;
print "\n";
}
OUTPUT
utmhn: domain.com
utmwv: 5.3.7
utms: 22
utmn: 1234
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
USAGE
tshark ... | ./script.pl
ADVANTAGES
- I take care to display
utmhn: domain.com
at the first line - I run an URI unescape on values
- It's not limited to "utmhn", "utmt", "utme", "utmcs" only
OTHER TIPS
file
82.387501 hampus -> domain.net 1261 GET /__utm.gif?utmwv=5.3.7&utms=22&utmn=1234&utmhn=domain.com&utmt=event&utme=5(x*y*z%2Fstart%2Fklipp%2F166_SS%20example)(10)&utmcs=UTF-8~ HTTP/1.1
command
sed 's/.*utmhn=/uthmhn: /
s/&utmt=/\nutmt: /
s/&utme=/\nutme: /
s/utmcs=/\nutmcs: /
s:[%]2F:/:g
s:[%]20: :g
s:[\(]:(\n\t :
s:\*:\n\t :g
s:[\)]:\n\t ):
s/[~].*$//' samp1.txt
output
uthmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS example
)(10)&
utmcs: UTF-8
I'm not sure what to say about your %20 VS the expected result of '/' char in your sample data. Did you manually type some of this in?
Here's one way using GNU awk
. Run like:
awk -f script.awk file.txt
Contents of script.awk
:
BEGIN {
FS="[ \t=&~]+"
OFS="\t"
}
{
for (i=1; i<=NF; i++) {
if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) {
if ($i == "utme") {
sub(/\(/,"(\n\t ", $(i+1))
gsub(/*/,"\n\t ", $(i+1))
sub(/\)/,"\n\t )", $(i+1))
}
print $i":", $(i+1)
}
}
}
Results:
utmhn: domain.net
utmt: event
utme: 5(
x
y
z%2Fstart%2Fklipp%2F166_SS%20example
)(10)
utmcs: UTF-8
Alternatively, here's the one-liner:
awk 'BEGIN { FS="[ \t=&~]+"; OFS="\t" } { for (i=1; i<=NF; i++) { if ($i ~ /^utmhn$|^utmt$|^utme$|^utmcs$/) { if ($i == "utme") { sub(/\(/,"(\n\t ", $(i+1)); gsub(/*/,"\n\t ", $(i+1)); sub(/\)/,"\n\t )", $(i+1)) } print $i":", $(i+1) } } }' file.txt
assuming your data is in a file called "file":
awk -F "&" '{ for ( i=2;i<=NF;i++ ){sub(/=/,":\t",$i);sub(/[~].*$/,"",$i);gsub(/\%2F/,"/",$i);gsub(/\%20/," ",$i);print $i} }' tst
produces output:
utms: 22
utmn: 1234
utmhn: domain.com
utmt: event
utme: 5(x*y*z/start/klipp/166_SS example)(10)
utmcs: UTF-8
it's a bit dirty, but it works.
$ cat tst.awk
BEGIN { FS="[&=~]"; OFS=":\t" }
{
for (i=1;i<=NF;i++) {
map[$i]=$(i+1)
}
sub(/\(/,"&\n\t ", map["utme"])
gsub(/\*/,"\n\t ", map["utme"])
gsub(/%2./,"/", map["utme"])
sub(/\)/,"\n\t&", map["utme"])
print "utmhn", map["utmhn"]
print "utmt", map["utmt"]
print "utme", map["utme"]
print "utmcs", map["utmcs"]
}
$
$ awk -f tst.awk file
utmhn: domain.com
utmt: event
utme: 5(
x
y
z/start/klipp/166_SS/example
)(10)
utmcs: UTF-8
This might work for you (GNU sed):
sed 's/.*\(utmhn.*=\S*\).*/\1/;s/&/\n/g;s/=/:\t/g;s/(/&\n\t/;s/*/\n\t/g;s/%2F/\//g;s/%20/ /g;s/)/\n\t&/' file