Aborting a GET
request
Using the :content_cb
option, you can provide a callback function to get()
that will be executed for each chunk of response content received from the server. You can set* the chunk size (in bytes) using the :read_size_hint
option. These options are documented in LWP::UserAgent (get()
in WWW::Mechanize is just an overloaded version of the same method in LWP::UserAgent).
The following request will be aborted after reading 1024 bytes of response content:
use WWW::Mechanize;
sub callback {
my ($data, $response, $protocol) = @_;
die "Too much data";
}
my $mech = WWW::Mechanize->new;
my $url = 'http://www.example.com';
$mech->get($url, ':content_cb' => \&callback, ':read_size_hint' => 1024);
print $mech->response()->header('X-Died');
Output:
Too much data at ./mechanize line 12.
Note that the die
in the callback does not cause the program itself to die; it simply sets the X-Died
header in the response object. You can add the appropriate logic to your callback to determine under what conditions a request should be aborted.
Don't even fetch URL if content is too large
Based on your comments, it sounds like what you really want is to never send a request in the first place if the content is too large. This is quite different from aborting a GET
request midway through, since you can fetch the Content-Length
header with a HEAD
request and perform different actions based on the value:
my @urls = qw(http://www.example.com http://www.google.com);
foreach my $url (@urls) {
$mech->head($url);
if ($mech->success) {
my $length = $mech->response()->header('Content-Length') // 0;
next if $length > 1024;
$mech->get($url);
}
}
Note that according to the HTTP spec, applications should set the Content-Length
header. This does not mean that they will (hence the default value of 0
in my code example).
* According to the documentation, the "protocol module which will try to read data from the server in chunks of this size," but I don't think it's guaranteed.