Вопрос

I have a list of URLs of pdf files that i want to download, from different sites.

In my firefox i have chosen the option to save PDF files directly to a particular folder.

My plan was to use WWW::Mechanize::Firefox in perl to download each file (in the list - one by one) using Firefox and renaming the file after download.

I used the following code to do it :

    use WWW::Mechanize::Firefox;
    use File::Copy;

    # @list contains the list of links to pdf files
    foreach $x (@list) {
        my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);
        $mech->get($x);  #This downloads the file using firefox in desired folder

        opendir(DIR, "output/download");
        @FILES= readdir(DIR);
        my $old = "output/download/$FILES[2]";
        move ($old, $new);  # $new is the URL of the new filename
    }

When i run the file, it opens the first link in Firefox and Firefox downloads the file to the desired directory. But, after that the 'new tab' is not closed and the file does not get renamed and the code keeps running (like its encountered an endless loop) and no further file gets downloaded.

What is going on here? Why isnt the code working? How do i close the tab and make the code read all the files in the list? Is there any alternate way to download?

Это было полезно?

Решение

Solved the problem.

The function,

$mech->get() 

waits for 'DOMContentLoaded' Firefox event to be fired by Firefox upon page load. As i had set Firefox to download the files automatically, there was no page being loaded. Thus, the 'DOMContentLoaded' event was never being fired. This led to pause in my code.

I set the function to not wait for the page to load by using the following option

$mech->get($x, synchronize => 0);

After this, i added 60 second delay to allow Firefox to download the file before code progresses

sleep 60;

Thus, my final code look like

use WWW::Mechanize::Firefox;
use File::Copy;

# @list contains the list of links to pdf files
foreach $x (@list) {
    my $mech = WWW::Mechanize::Firefox->new(autoclose => 1);

    $mech->get($x, synchronize => 0);
    sleep 60;

    opendir(DIR, "output/download");
    @FILES= readdir(DIR);
    my $old = "output/download/$FILES[2]";
    move ($old, $new);  # $new is the URL of the new filename
}

Другие советы

If i understood you correctly, you have the links to the actual pdf files. In that case WWW::Mechanize is most likely easier than WWW::Mechanize::Firefox. In fact, i think that is almost always the case. Then again, watching the browser work is certainly cooler.

use strict;
use warnings;

use WWW::Mechanize;

# your code here
# loop

    my $mech = WWW::Mechanize->new();    # Could (should?) be outside of the loop.
    $mech->agent_alias("Linux Mozilla"); # Optionally pretend to be whatever you want.

    $mech->get($link);
    $mech->save_content("$new");

#end of the loop

If that is absolutely not what you wanted, my cover story will be that i did not want to break my 666 rep!

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top