Question

I have a text file that looks like the following:

Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.

There are 5 lines that are always the same, and on the 6th line is where I want to start reading data. Upon reading data, each line (starting from line 6) is delimited by semicolons. I need to just get the first entry of each line (starting on line 6).

For example:

Line 1
Line 2
Line 3
Line 4
Line 5
filename2.tif;Smpl/Pix & Bits/Smpl are missing.
filename4.tif;Smpl/Pix & Bits/Smpl are missing.
filename6.tif;Smpl/Pix & Bits/Smpl are missing.
filename8.tif;Smpl/Pix & Bits/Smpl are missing.  

Output desired would be:

filename2.tif
filename4.tif
filename6.tif
filename8.tif

Is this possible, and if so, where do I begin?

Was it helpful?

Solution

This uses the Perl 'autosplit' (or 'awk') mode:

perl -n -F'/;/' -a -e 'next if $. <= 5; print "$F[0]\n";' < data.file

See 'perlrun' and 'perlvar'.


If you need to do this in a function which is given a file handle and a number of lines to skip, then you won't be using the Perl 'autosplit' mode.

sub skip_N_lines_read_column_1
{
    my($fh, $N) = @_;
    my $i = 0;
    my @files = ();
    while (my $line = <$fh>)
    {
        next if $i++ < $N;
        my($file) = split /;/, $line;
        push @files, $file;
    }
    return @files;
}

This initializes a loop, reads lines, skipping the first N of them, then splitting the line and capturing the first result only. That line with my($file) = split... is subtle; the parentheses mean that the split has a list context, so it generates a list of values (rather than a count of values) and assigns the first to the variable. If the parentheses were omitted, you would be providing a scalar context to a list operator, so you'd get the number of fields in the split output assigned to $file - not what you needed. The file name is appended to the end of the array, and the array is returned. Since the code did not open the file handle, it does not close it. An alternative interface would pass the file name (instead of an open file handle) into the function. You'd then open and close the file in the function, worrying about error handling.

And if you need the help with opening the file, etc, then:

use Carp;

sub open_skip_read
{
    my($name) = @_;
    open my $fh, '<', $name or croak "Failed to open file $name ($!)";
    my @list = skip_N_lines_read_column_1($fh, 5);
    close $fh or croak "Failed to close file $name ($!)";
    return @list;
}

OTHER TIPS

#!/usr/bin/env perl
#
# name_of_program - what the program does as brief one-liner
#
# Your Name <your_email@your_host.TLA>
# Date program written/released
#################################################################

use 5.10.0;

use utf8;
use strict;
use autodie;
use warnings FATAL => "all";

#  ⚠ change to agree with your input: ↓
use open ":std" => IN    => ":encoding(ISO-8859-1)",
                   OUT   => ":utf8";
#  ⚠ change for your output: ↑ — *maybe*, but leaving as UTF-8 is sometimes better

END {close STDOUT}

our $VERSION = 1.0;

$| = 1;

if (@ARGV == 0 && -t STDIN) {
   warn "reading stdin from keyboard for want of file args or pipe";
}

while (<>) {
    next if 1 .. 5;
    my $initial_field = /^([^;]+)/ ? $1 : next;
    #    ╔═══════════════════════════╗
    #   ☞ your processing goes here ☜
    #    ╚═══════════════════════════╝
} continue {
    close ARGV if eof;
}

__END__

Kinda ugly but, read out the dummy lines and then split on ; for the rest of them.

my $logfile = '/path/to/logfile.txt';

open(FILE, $logfile) || die "Couldn't open $logfile: $!\n";

for (my $i = 0 ; $i < 5 ; $i++) {
   my $dummy = <FILE>;
}

while (<FILE>) {
   my (@fields) = split /;/;
   print $fields[0], "\n";
}

close(FILE);
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top