How can I extract the columns of data with Perl?
Pergunta
I have strings of this kind
NAME1 NAME2 DEPTNAME POSITION
JONH MILLER ROBERT JIM CS ASST GENERAL MANAGER
I want the output to be name1 name2 and position how can i do it using split/regex/trim/etc and without using CPAN modules?
Solução
If your input data comes in as an array of strings (@strings), this
for my $s (@strings) {
my $output = join ' ',
map /^\s*(.+)\s*$/ ? $1 : (),
unpack('A19 A15 x19 A*', $s);
print "$output\n"
}
would extract and trim the information needed.
NAME1 | NAME2 | POSITION
and
JONH MILLER | ROBERT JIM | ASST GENERAL MANAGER
(The '|' were included by me for better expalnation of the result)
Regards
rbo
Outras dicas
It's going to depend on whether those are fixed length fields, or if they are tab separated. The easiest (using split) is if they are tab separated.
my ($name1, $name2, $deptName, $position) = split("\t", $string);
If they're fixed length, and assuming they are all, say, 10 characters long, you can parse it like
my ($name1, $name2, $deptName, $position) = unpack("A10 A10 A10 A10", $string);
Assuming that space between the fields are not fixed so split string on the basis of two or more spaces so that it will not break the Name like JONH MILLER into two parts.
#!/usr/bin/perl
use strict;
use warning;
my $string = "NAME1 NAME2 DEPTNAME POSITION
JONH MILLER ROBERT JIM CS ASST GENERAL MANAGER ";
my @string_parts = split /\s\s+/, $string;
foreach my $test (@string_parts){
print"$test\n";
}
From the sample there, a single space belongs in the data, but 2 or more contiguous spaces do not. So you can easily split on 2 or more spaces. The only thing I add to this is the use of List::MoreUtils::mesh
use List::MoreUtils qw<mesh>;
my @names = map { chomp; $_ } split /\s{2,}/, <$file>;
my @records = map { chomp; { mesh( @names, @{[ split /\s{2,}/ ]} ) } } <$file>;
Consider using autosplit in a Perl one-liner from your command line:
$ perl -F/\s{2,}/ -ane 'print qq/@F[0,1,3]\n/' file
The one-liner will split on two or more consecutive spaces and print the first, second and fourth fields, corresponding to NAME1, NAME2 and POSITION fields.
Of course, this will break if you have only a single space separating NAME1 and NAME2 entries, but more information is needed about your file in order to ascertain what the best course of action might be.
To split on whitespace:
@string_parts = split /\s{2,}/, $string;
This will split $string
into a list of substrings. The separator will be the regex \s+
, which means one or more whitespace characters. This includes spaces, tabs, and (unless I'm mistaken) newlines.
Edit: I see that one of the requirements is not to split on only one space, but to split on two or more. I modified the regex accordingly.