Question

I have a list of files URLS where I want to download them:

http://somedomain.com/foo1.gz
http://somedomain.com/foo2.gz
http://somedomain.com/foo3.gz

What I want to do is the following for each file:

  1. Download foo1,2.. in parallel with wget and nohup.
  2. Every time it completes download process them with myscript.sh

What I have is this:

#! /usr/bin/perl

@files = glob("foo*.gz");

foreach $file (@files) {
   my $downurls = "http://somedomain.com/".$file;
   system("nohup wget $file &");
   system("./myscript.sh $file >> output.txt");
}

The problem is that I can't tell the above pipeline when does the file finish downloading. So now it myscript.sh doesn't get executed properly.

What's the right way to achieve this?

Was it helpful?

Solution

Try combining the commands using &&, so that the 2nd one runs only after the 1st one completes successfully.

system("(nohup wget $file  && ./myscript.sh $file >> output.txt) &");

OTHER TIPS

Why to do this using perl. use bash instead. Below is just a sample.

#!/bin/bash

for file in foo1 foo2 foo3
do
    wget http://samedomain.com/$file.gz .

    if [ -f $file.gz ];
    then
        ./myscript.sh $file.gz >> output.txt
    fi
done

If you want parallel processing, you can do it yourself with forking, or use a built in module to handle it for you. Try Parallel::ForkManager. You can see a bit more on it's usage in How can I manage a fork pool in Perl?, but the CPAN page for the module will have the real useful info. You probably want something like this:

use Parallel::ForkManager;

my $MAX_PROCESSES = 8; # 8 parallel processes max
my $pm = new Parallel::ForkManager($MAX_PROCESSES);

my @files = glob("foo*.gz");

foreach $file (@all_data) {
  # Forks and returns the pid for the child:
  my $pid = $pm->start and next; 

  my $downurls = "http://somedomain.com/".$file;
  system("wget $file");
  system("./myscript.sh $file >> output.txt");

  $pm->finish; # Terminates the child process
}

print "All done!\n";
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top