Pregunta

I want to convert excel-files to csv-files with Perl. For convenience I like to use the module File::Slurp for read/write operations. I need it in a subfunction.

While printing out to the screen, the program generates the desired output, the generated csv-files unfortunately just contain one row with semicolons, field are empty.

Here is the code:

#!/usr/bin/perl

use File::Copy;
use v5.14;
use Cwd;
use File::Slurp;
use Spreadsheet::ParseExcel;


sub xls2csv {
    my $currentPath = getcwd();
    my @files       = <$currentPath/stage0/*.xls>;

    for my $sourcename (@files) {
        print "Now working on $sourcename\n";
        my $outFile = $sourcename;
        $outFile =~ s/xls/csv/g;
        print "Output CSV-File: ".$outFile."\n";
        my $source_excel = new Spreadsheet::ParseExcel;
        my $source_book  = $source_excel->Parse($sourcename)
          or die "Could not open source Excel file $sourcename: $!";

        foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
        {
            my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];

            next unless defined $source_sheet->{MaxRow};
            next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
            next unless defined $source_sheet->{MaxCol};
            next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};

            foreach my $row_index (
                $source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
            {
                foreach my $col_index (
                    $source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
                {
                    my $source_cell =
                      $source_sheet->{Cells}[$row_index][$col_index];
                    if ($source_cell) {

                        print $source_cell->Value, ";"; # correct output!

                        write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); # only one row of semicolons with empty fields!
                    }
                }
                print "\n";
            }
        }

    }
}

xls2csv();

I know it has something to do with the parameter passing in the write_file function, but couldn't manage to fix it.

Has anybody an idea?

Thank you very much in advance.

¿Fue útil?

Solución

write_file will overwrite the file unless the append => 1 option is given. So this:

write_file( $outFile, { binmode => ':utf8' }, $source_cell->Value, ";" ); 

Will write a new file for each new cell value. It does however not match your description of "only one row of semi-colons of empty fields", as it should only be one semi-colon, and one value.

I am doubtful towards this sentiment from you: "For convenience I like to use the module File::Slurp". While the print statement works as it should, using File::Slurp does not. So how is that convenient?

What you should do, if you still want to use write_file is to gather all the lines to print, and then print them all at once at the end of the loop. E.g.:

$line .= $source_cell->Value . ";";   # use concatenation to build the line
...
push @out, "$line\n";                 # store in array
...
write_file(...., \@out);              # print the array

Another simple option would be to use join, or to use the Text::CSV module.

Otros consejos

Well, in this particular case, File::Slurp was indeed complicating this for me. I just wanted to avoid to repeat myself, which I did in the following clumsy working solution:

#!/usr/bin/perl

use warnings;
use strict;
use File::Copy;
use v5.14;
use Cwd;
use File::Basename;
use File::Slurp;
use Tie::File;
use Spreadsheet::ParseExcel;
use open qw/:std :utf8/;

# ... other functions

sub xls2csv {
    my $currentPath = getcwd();
    my @files       = <$currentPath/stage0/*.xls>;
    my $fh;

    for my $sourcename (@files) {
        say "Now working on $sourcename";
        my $outFile = $sourcename;
        $outFile =~ s/xls/csv/gi;
        if ( -e $outFile ) {
            unlink($outFile) or die "Error: $!";
            print "Old $outFile deleted.";
        }
        my $source_excel = new Spreadsheet::ParseExcel;
        my $source_book  = $source_excel->Parse($sourcename)
          or die "Could not open source Excel file $sourcename: $!";

        foreach my $source_sheet_number ( 0 .. $source_book->{SheetCount} - 1 )
        {
            my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];

            next unless defined $source_sheet->{MaxRow};
            next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
            next unless defined $source_sheet->{MaxCol};
            next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};

            foreach my $row_index (
                $source_sheet->{MinRow} .. $source_sheet->{MaxRow} )
            {
                foreach my $col_index (
                    $source_sheet->{MinCol} .. $source_sheet->{MaxCol} )
                {
                    my $source_cell =
                      $source_sheet->{Cells}[$row_index][$col_index];
                    if ($source_cell) {
                        print $source_cell->Value, ";";
                        open( $fh, '>>', $outFile ) or die "Error: $!";
                        print $fh $source_cell->Value, ";";
                        close $fh;
                    }
                }
                print "\n";
                open( $fh, '>>', $outFile ) or die "Error: $!";
                print $fh "\n";
                close $fh;
            }
        }

    }
}

xls2csv();

I'm actually NOT happy with it, since I'm opening and closing the files so often (I have many files with many lines). That's not very clever in terms of performance.

Currently I still don't know how to use the split or Text:CSV in this case, in order to put everything into an array and to open, write and close each file only once.

Thank you for your answer TLP.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top