Question

I want to traverse through .gz file and read the contents of file.

My folder structure: 1) ABC.gz 1.1) ABC 1.1.1) Sample1.txt 1.1.2) Sample2.txt 1.1.3) Test1.txt

I wanted to traverse through .gz , then read and print the contents of Sample*.txt file. Test*.txt should be ignored. Importantly i do not want to copy / extract the gz to a different location.

Perl script i have to read the file:

use strict;
use warnings;

my $filename = 'Sample1.txt';
open(my $fh, '<:encoding(UTF-8)', $filename)
  or die "Could not open file '$filename' $!";

while (my $row = <$fh>) {
  chomp $row;
  print "$row\n";
}
Was it helpful?

Solution

First of all a gzip file is a compressed version of a single file. From your description you most likely have a tar archive which was then compressed.

The second point is that you will have to decompress it, either in memory or a temporary file.

You will definitely not be able to read it row by row.

Take a look at Tie::Gzip for the handling of compressed files and at Archive::Tar for tar archives.

OTHER TIPS

Maybe something like this:

#!/usr/bin/perl -w

use IPC::System::Simple "capture";
use File::Path qw[ make_path remove_tree ];

use warnings;
use strict;

my $tar = "/path/to/archive.tar.gz";    
my @list = capture("tar tzf $tar | awk '{print \$NF}'");
my $tmp_path = "/your/tmp/path";
make_path($tmp_path) if not -e $tmp_path;

foreach my $file (@list) {
  if ($file =~ /(Sample*\.txt)$/) {

    my $out = capture("tar xzf $tmp_path/$1 -O");
    print "$out\n";
    #unlink $tmp_path/$1;
  }
}
remove_tree($tmp_path);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top