如何使用awk，perl或sed从LiveHTTPHeaders输出中删除响应？

https://stackoverflow.com/questions/1812940

06-07-2019
|

题

假设我有类似的东西（这只是一个例子，实际的请求会有所不同：我加载了启用了LiveHTTPHeaders的StackOverflow，可以使用一些样本）：

http://stackoverflow.com/

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Expires: Sat, 28 Nov 2009 16:04:24 GMT
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Date: Sat, 28 Nov 2009 16:04:23 GMT
Content-Length: 19015
----------------------------------------------------------
...

pastebin 上提供了请求和回复的完整记录

我想删除所有响应（例如，HTTP / 1.x 200 OK以及该响应中的所有内容）以及显示页面地址的所有一个内容。我希望只在文本文件中保留所有请求并保存LiveHTTPHeaders输出。

因此，输出将是：

GET / HTTP/1.1
Host: stackoverflow.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

GET /so/all.css?v=5290 HTTP/1.1
Host: sstatic.net
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070220 Firefox/2.0.0.2
Accept: text/css,*/*;q=0.1
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://stackoverflow.com/

...

同样， pastebin 上提供了我想保留的全文。

如果我将LiveHTTPHeaders捕获的会话保存到文本文件中，我想在此问题中得到第二个“代码”的结果，我该怎么做？也许使用 awk ， sed 或 perl ？或者是其他东西？我在Linux上。

编辑：我正在尝试运行思南的剧本。脚本是这样的：

#!/usr/bin/perl
local $/ = "\n\n";
while (<>) {
    print if /^GET|POST/; # Add more request types as needed
}

我尝试过这样运行：

./cleanup-headers.pl livehttp.txt > filtered.txt

就这样：

perl cleanup-headers.pl < livehttp.txt > filtered.txt

... file filtered.txt已创建，但它完全为空。

有人在我粘贴到pastebin的FULL标题上试了吗？它有效吗？

完整标题

解决方案

看起来你正在追踪空白问题。

$ sed -e 's/^\s*$//' livehttp.txt | \
  perl -e '$/ = ""; while (<>) { print if /^(GET|POST)/ }'

这可以通过将Perl的readline运算符置于段落模式（通过 $ / =＆quot;＆quot; ）来实现，它一次抓取一个块，由两个或多个连续的换行符分隔。

当它工作时很好，但它有点脆弱。空白但不是空行会使作品变得混乱，但 sed 可以清理它们。

等效且更简洁的命令：

$ sed -e 's/^\s*$//' livehttp.txt | perl -000 -ne 'print if /^(GET|POST)/'

其他提示

Perl：

local $/ = "\n\n";
while (<>) {
    print if /^(?:GET|POST)/; # Add more request types as needed
}

注意：查看LiveHTTPHeaders生成的输出，条目之间用两个新行分开，所以我认为设置 $ / =＆quot; \ n \ n＆quot; 比设置 $ / =''更合适。我相信你的问题是由于输入文件中的行实际上是缩进的。

我最初从 pastebin 下载文件并使用完整文件来测试我的脚本。我不相信您在计算机上测试的文件与您放在 pastebin 上的文件相同。

如果要在保持与LiveHTTPHeaders输出格式一致的情况下稳健地处理可能的缩进行，则应使用以下内容：

#!/usr/bin/perl

use strict; use warnings;

local $/ = "\n\n";
while (<>) {
    next unless /^\s*(?:GET|POST)/;
    s!^\s+!!gm;
    print;
}

我认为在同一个管道中使用 sed 和 perl 会有点令人厌恶。

只有一个gawk命令

awk -vRS= '/^(GET|POST)/' ORS="\n\n" file

你可以使用bash shell

while read -r line
do    
    case "$line" in
        GET*|POST*) flag=1;;        
        "") flag=0;;
    esac
    [ "$flag" -eq 1 ] && echo "$line"
done < "file"

将Sinan的代码运行为：

perl test.pl < infile.txt > outfile.txt

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow