Aiuta a fondere le routine del codice Perl insieme per l'elaborazione dei file

https://stackoverflow.com/questions/5052174

15-11-2019
|

Domanda

Ho bisogno di un aiuto perl per mettere questi (2) processi / codice per lavorare insieme.Sono stato in grado di farli funzionare individualmente per testare, ma ho bisogno di aiuto per portarli insieme, specialmente con l'utilizzo dei costrutti del loop.Non sono sicuro se dovrei andare con fooach ... ogni volta che il codice è qui sotto.

Inoltre, tutte le migliori pratiche sarebbero state grandiose che sto imparando questa lingua.Grazie per il tuo aiuto.

Ecco il flusso di processo che sto cercando:

Leggi una directory
cerca un particolare file
Utilizzare il nome del file per spuntare alcune informazioni chiave per creare un file appena elaborato
Elabora il file di input

Crea il file appena elaborato per ciascun file di input Leggi (se ho letto in 10, creo 10 nuovi file)

Parte 1:

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if ($file =~ /^\.+$/);
    #Get filename attributes
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
      print "$1\n";
      print "$2\n";
      print "$3\n";
    }
    print "$file\n";
}

Parte 2:

use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
    my $digest = md5_hex($data);
    chomp;
    my (@values) = split /,/;
    my $extra = "__mykey__$sep1$digest$sep2" ;
    $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
    $data .= "$extra$eorec"; 
    print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

Soluzione

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:

#!/usr/bin/env perl

# - Never forget these!
use strict;
use warnings;

use Digest::MD5 qw(md5_hex);

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    # Parens on postfix "if" are optional; I prefer to omit them
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
        process_file($file, $1, $2, $3);
    }
    print "$file\n";
}

sub process_file {
    my ($orig_name, $foo_x, $name_x, $p_x) = @_;

    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";

    # - From your description of the task, it sounds like we actually want to
    #   read from the found file, not from <>, so opening it here to read
    # - Better to use lexical ("my") filehandle and three-arg form of open
    # - "or" has lower operator precedence than "||", so less chance of
    #   things being grouped in the wrong order (though either works here)
    # - Including $! in the error will tell why the file open failed
    open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        # - Useless use of double quotes removed on next two lines
        $data .= $extra . $eorec;
        #print $out_fh $data;
    }
    # - Moved print to output file to here (where it will print the complete
    #   output all at once) rather than within the loop (where it will print
    #   all previous lines each time a new line is read in) to prevent
    #   duplicate output records.  This could also be achieved by printing
    #   $extra inside the loop.  Printing $data at the end will be slightly
    #   faster, but requires more memory; printing $extra within the loop and
    #   getting rid of $data entirely would require less memory, so that may
    #   be the better option if you find yourself needing to read huge input
    #   files.
    print $out_fh $data;

    # - $in_fh and $out_fh will be closed automatically when it goes out of
    #   scope at the end of the block/sub, so there's no real point to
    #   explicitly closing it unless you're going to check whether the close
    #   succeeded or failed (which can happen in odd cases usually involving
    #   full or failing disks when writing; I'm not aware of any way that
    #   closing a file open for reading can fail, so that's just being left
    #   implicit)
    close $out_fh or die "Failed to close file: $!";
}

Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.

Altri suggerimenti

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:

use Modern::Perl;
# use... 

sub get_input_files {
  # return an array of files (@)
}

sub extract_file_info {
   # takes the file name and returs an array of values (filename attrs)
}

sub process_file {
   # reads the input file, takes the previous attribs and build the output file
}


my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
   my @attrs = extract_file_info($ifile);
   process_file($ifile, @attrs);
}

Hope it helps

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow