Question

I have a set of HTML files which are used as an interface for accessing some files on a server. There's a main page which links to various report pages which then have links to files. I'm trying to get the linked files on each page and determine if they are up to date. However, I'm taking it a step at a time and I want to get this piece figured out first.

For reference, the folder structure is like this:

//server/
|---pages/
|---+---MainPage.htm
|---+---reports/
|---+---+---Report1.htm
|---+---+---Report2.htm

The find_all_links() method gets the links I want. However, when I try to pass the resulting WWW::Mechanize::Link objects' URLs to follow_link(), it says the file doesn't exist. The file DOES exist, just not where it thinks it exists. Oddly enough, following the link manually in a browser works just fine.

Below is my code thus far for tackling this.

use strict;
use warnings;
use WWW::Mechanize;

my $dir = '//server/pages';
chdir($dir);

my $mech = WWW::Mechanize->new();
$mech->get("file:$dir/MainPage.htm");

my @links = $mech->find_all_links(url_regex => qr/^\/reports\/.*/i);

foreach my $link (@links){
    print $link->url(), "\n";
    $mech->follow_link(url => $link->url());
    # Get all links on this page and check the modified dates
    ...
    $mech->back();
}

It produces the following output:

/reports/Report1.htm
Error GETing file://server/reports/Report1.htm: File `\\server\reports\Report1.htm` does not exist at script.pl line 15.

The file path it's using is incorrect and that's why it can't find the file. How do I get it to use the correct path? I've also tried url_abs() in place of url(). Any help, guidance, and/or insight would be greatly appreciated. Thanks so much!

Was it helpful?

Solution

You need to prepend $dir to your URL. Instead of using follow_link(), which won't let you specify the URL, simply use another get():

$mech->get( "file:$dir/" . $link->url() );
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top