Question

I need to login to a site, parse the HTML page and extract the value between a specific HTML tag.

I am able to do this successfully on a page which doesn't require login data. I am using the HTML::Parser class.

LWP::UserAgent provides the cookie_jar method to set a cookie by loading it from a file. However, I want to harcode the cookie value in the script itself. Is that possible? I couldn't find any working example on the net.

Here's my code:

Please excuse the missing 'my' in variable declarations in a few places. I wrote this code in a hurry trying to understand the concept of Cookie Handling in LWP::UserAgent.

#!/usr/bin/perl

use strict;
use warnings;
use HTTP::Request::Common;
use LWP::UserAgent;
use HTTP::Response;
use HTTP::Cookies;

package IdentityParse;
use base "HTML::Parser";

my $title_flag=0;
my $title="";

my $cookie_jar= HTTP::Cookies->new;
$cookie_jar->clear;
$cookie_jar->set_cookie(Name=Value); #Example, PHPSESSID=710c7aa60aa5cacdc40028ef79de24b2

sub text{
my($self,$text)=@_;
if($title_flag)
{
    $title.=$text;
}
}

sub start{
my($self,$tag,$attr,$attrseq,$origtext)=@_;
if($tag =~ /^title$/i)
{
    $title_flag=1;
}
}

sub end{
my($self,$tag,$origtext)=@_;
if($tag =~ /^title$/i)
{
    $title_flag=0;
}
}

my $url="http://sitename.com/users/index.php";

my $ua= LWP::UserAgent->new();
$ua->agent('NeonFlash');
$ua->timeout(30);
$ua->cookie_jar($cookie_jar);

my $req= HTTP::Request->new(GET => $url);
my $res= ($ua->request($req))->content;

my $p = new IdentityParse;
$p->parse($res);

$p->eof;

print "The title of the web page is: ".$title."\n";

Summary:

I am using the HTML::Parser class to parse the HTTP Response HTML page. In order to read the value between the tags, I am overriding the methods, start,text and end of HTML::Parser.

The cookie value is being passed in a form of Key and Value. I know, it's possible to load a cookie from a text file though I haven't tried it myself. But I want to know if we can do it this way as well.

Thanks.

Was it helpful?

Solution

For this kind of tasks, I prefer the WWW::Mechanize module.

OTHER TIPS

This is the source code of HTTP::Cookies::set_cookie

sub set_cookie
{
    my $self = shift;
    my($version,
       $key, $val, $path, $domain, $port,
       $path_spec, $secure, $maxage, $discard, $rest) = @_; 

    # path and key can not be empty (key can't start with '$')
    return $self if !defined($path) || $path !~ m,^/, ||
                !defined($key)  || $key  =~ m,^\$,;

    # ensure legal port
    if (defined $port) {
    return $self unless $port =~ /^_?\d+(?:,\d+)*$/;
    }   

Seems like if you wanna set cookies to the cookie_jar, you have to pass the array instead of the '''key=value'''
Or you can also use LWP::UserAgent::default_header to set the request cookies.
Because HTTP cookies are part of the

Simply set the HTTP Cookie header from the HTTP::Request class.

Your parsing code is painful to look at. For a declarative approach, employ Web::Query or HTML::Query.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top