How can I remove responses from LiveHTTPHeaders output using awk, perl or sed?
Question
Let's say I have something like this (this is only an example, actual request will be different: I loaded StackOverflow with LiveHTTPHeaders enabled to have some samples to work on):
http://stackoverflow.com/ GET / HTTP/1.1 Host: stackoverflow.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive HTTP/1.x 200 OK Cache-Control: private Content-Type: text/html; charset=utf-8 Content-Encoding: gzip Expires: Sat, 28 Nov 2009 16:04:24 GMT Vary: Accept-Encoding Server: Microsoft-IIS/7.0 Date: Sat, 28 Nov 2009 16:04:23 GMT Content-Length: 19015 ---------------------------------------------------------- ...
Full log of requests and responses is available on pastebin
And I want to remove all responses (HTTP/1.x 200 OK and everything in that response, for example) and all one liners showing page address. I would like to only have all requests left in text file with saved LiveHTTPHeaders output.
So, the output would be:
GET / HTTP/1.1 Host: stackoverflow.com User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive GET /so/all.css?v=5290 HTTP/1.1 Host: sstatic.net User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2 Accept: text/css,*/*;q=0.1 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://stackoverflow.com/ ...
Again, the full text of what I want to keep is available on pastebin.
If I save LiveHTTPHeaders captured session to text file and I would like to get result like from second 'code' in this question, how do I do this? Maybe with awk
, sed
or perl
? Or something else? I'm on Linux.
Edit: I'm trying to run Sinan's script. Script is this:
#!/usr/bin/perl
local $/ = "\n\n";
while (<>) {
print if /^GET|POST/; # Add more request types as needed
}
I tried running it this way:
./cleanup-headers.pl livehttp.txt > filtered.txt
And this way:
perl cleanup-headers.pl < livehttp.txt > filtered.txt
... file filtered.txt was created but it's totally empty.
Anyone tried it on FULL headers i pasted into pastebin? Did it worked?
Solution
Looks like you're having trailing whitespace issues.
$ sed -e 's/^\s*$//' livehttp.txt | \
perl -e '$/ = ""; while (<>) { print if /^(GET|POST)/ }'
This works by putting Perl's readline operator into paragraph mode (via $/ = ""
), which grabs records a chunk at a time, separated by two or more consecutive newlines.
It's nice when it works, but it's a bit brittle. Blank but not empty lines will gum up the works, but sed
can clean those up.
Equivalent and more concise command:
$ sed -e 's/^\s*$//' livehttp.txt | perl -000 -ne 'print if /^(GET|POST)/'
OTHER TIPS
In Perl:
local $/ = "\n\n";
while (<>) {
print if /^(?:GET|POST)/; # Add more request types as needed
}
Notes: Looking at the output generated by LiveHTTPHeaders, entries are quite clearly separated by two newlines, so I think setting $/ = "\n\n"
is more appropriate than setting $/ = ''
. I believe your problems were due to the fact that the lines in your input file were actually indented.
I did originally download the file from pastebin and use the full file to test my script. I do not believe the file you were using to test on your computer was identical to the one you put on pastebin.
If you want to robustly deal with possibly indented lines while remaining consistent with the format of the output of LiveHTTPHeaders, you should use something like the following:
#!/usr/bin/perl
use strict; use warnings;
local $/ = "\n\n";
while (<>) {
next unless /^\s*(?:GET|POST)/;
s!^\s+!!gm;
print;
}
I consider using sed
and perl
in the same pipeline to be a little bit of an abomination.
just one gawk command
awk -vRS= '/^(GET|POST)/' ORS="\n\n" file
you can use the bash shell
while read -r line
do
case "$line" in
GET*|POST*) flag=1;;
"") flag=0;;
esac
[ "$flag" -eq 1 ] && echo "$line"
done < "file"
Run Sinan's code as:
perl test.pl < infile.txt > outfile.txt