¿Cómo puedo modificar mi script de Perl para utilizar varios procesadores?

https://stackoverflow.com/questions/4429544

09-10-2019
|

Pregunta

Hola tengo un script simple que toma un archivo y se ejecuta otro script Perl en él. El guión hace esto para cada archivo de imagen en la carpeta actual. Esto se ejecuta en una máquina con procesadores 2 cuádruple núcleo Xeon, 16 GB de RAM, corriendo RedHat Linux.

El primer work.pl script llama básicamente magicplate.pl pasa algunos parámetros y el nombre del archivo para magicplate.pl al proceso. Placa mágica tarda aproximadamente un minuto para procesar cada imagen. Debido work.pl se preformación la misma función más de 100 veces y porque el sistema tiene múltiples procesadores y núcleos Estaba pensando en dividir la tarea a fin de que pueda ejecutar varias veces en paralelo. Podría dividir las imágenes de hasta diferentes carpetas si es necesario. Cualquier ayuda sería grande. Gracias

Esto es lo que tengo hasta ahora:

use strict;
use warnings;


my @initialImages = <*>;

foreach my $file (@initialImages) {

    if($file =~ /.png/){
        print "processing $file...\n";
        my @tmp=split(/\./,$file);
        my $name="";
        for(my $i=0;$i<(@tmp-1);$i++) {
            if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
        }

        my $exten=$tmp[(@tmp-1)];
        my $orig=$name.".".$exten;

        system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
     }
}

Solución

You could use Parallel::ForkManager (set $MAX_PROCESSES to the number of files processed at the same time):

use Parallel::ForkManager;
use strict;
use warnings;

my @initialImages = <*>;

foreach my $file (@initialImages) {

    if($file =~ /.png/){
        print "processing $file...\n";
        my @tmp=split(/\./,$file);
        my $name="";
        for(my $i=0;$i<(@tmp-1);$i++) {
            if($name eq "") { $name = $tmp[$i]; } else { $name=$name.".".$tmp[$i];}
        }

        my $exten=$tmp[(@tmp-1)];
        my $orig=$name.".".$exten;

  $pm = new Parallel::ForkManager($MAX_PROCESSES);
    my $pid = $pm->start and next;
        system("perl magicPlate.pl -i ".$orig." -min 4 -max 160 -d 1");
    $pm->finish; # Terminates the child process

     }
}

But as suggested by Hugmeir running perl interpreter again and again for each new file is not a good idea.

Otros consejos

You should consider NOT creating a new process for each file that you want to process -- It's horribly inefficient, and probably what is taking most of your time here. Just loading up Perl and whatever modules you use over and over ought to be creating some overhead. I recall a poster on PerlMonks that did something similar, and ended up transforming his second script into a module, reducing the worktime from an hour to a couple of minutes. Not that you should expect such a dramatic improvement, but one can dream..

With the second script refactored as a module, here's an example of thread usage, in which BrowserUK creates a thread pool, feeding it jobs through a queue.

Import "maigcplate" and use threading.
Start magicplate.pl in the background (you would need to add process throttling)
Import "magicplate" and use fork (add process throttling and a kiddy reaper)
Make "maigcplate" a daemon with a pool of workers = # of CPUs
- use an MQ implementation for communication
- use sockets for communication
Use webserver(nginx, apache, ...) and wrap in REST for a webservice
etc...

All these center around creating multiple workers that can each run on their own cpu. Certain implementations will use resources better (those that don't start a new process) and be easier to implement and maintain.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow