Question

I'm pretty new to Perl. While I just created a simple scripts to retrieve a file with getstore($url, $file); But how do I know whether the task is done correctly or the connection interrupted in the middle, or authentication failed, or whatever response. I searched all the web and I found some, like a response list, and some talking about useragent stuff, which I totally can't understand, especially the operator $ua->.
What I wish is to an explanation about that operator stuff (I don't even know what -> used for), and the RC code meaning, and finally, how to use it.
Its a lot of stuff so I appreciate any answer given, even just partially. And, thanks first for whoever will to help. =)

Was it helpful?

Solution

The LWP::Simple module is just that: quite simplistic. The documentation states that the getstore function returns the HTTP status code which we can save into a variable. There are also the is_success and is_error functions that tell us whether a certain return value is ok or not.

my $url      = "http://www.example.com/";
my $filename = "some-file.html";
my $rc = getstore($url, $filename)
if (is_error($rc)) {
  die "getstore of <$url> failed with $rc";
}

Of course, this doesn't catch errors with the file system.

The die throws a fatal exception that terminates the execution of your script and displays itself on the terminal. If you don't want to abort execution use warn.

The LWP::Simple functions provide high-level controls for common tasks. If you need more control over the requests, you have to manually create an LWP::UserAgent. An user agent (abbreviated ua) is a browser-like object that can make requests to servers. We have very detailed control over these requests, and can even modify the exact header fields.

The -> operator is a general dereference operator, which you'll use a lot when you need complex data structures. It is also used for method calls in object-oriented programming:

$object->method(@args);

would call the method on $object with the @args. We can also call methods on class names. To create a new object, usually the new method is used on the class name:

my $object = The::Class->new();

Methods are just like functions, except that you leave it to the class of the object to figure out which function exactly will be called.

The normal workflow with LWP::UserAgent looks like this:

use LWP::UserAgent; # load the class

my $ua = LWP::UserAgent->new();

We can also provide named arguments to the new method. Because these UA objects are robots, it is considered good manners to tell everybody who sent this Bot. We can do so with the from field:

my $ua = LWP::UserAgent->new(
  from => 'ss-tangerine@example.com',
);

We could also change the timeout from the default three minutes. These options can also be set after we constructed a new $ua, so we can do

$ua->timeout(30);  # half a minute

The $ua has methods for all the HTTP requests like get and post. To duplicate the behaviour of getstore, we first have to get the URL we are interested in:

my $url = "http://www.example.com/";

my $response = $ua->get($url);

The $response is an object too, and we can ask it whether it is_success:

$response->is_success or die $response->status_line;

So if execution flows past this statement, everything went fine. We can now access the content of the request. NB: use the decoded_content method, as this manages transfer encodings for us:

my $content = $response->decoded_content;

We can now print that to a file:

use autodie; # automatic error handling
open my $fh, ">", "some-file.html";
print {$fh} $content;

(when handling binary files on Windows: binmode $fh after opening the file, or use the ">:raw" open mode)

Done!

To learn about LWP::UserAgent, read the documentation. To learn about objects, read perlootut. You can also visit the perl tag on SO for some book suggestions.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top