Question

I have a file with several rows of text. Each row contains a text array that is represented as follows

["ABC","D EF","XYZ"]
["MNO","P","QR  ST"]
["A"]
...

Notice, some of the words within quotes have spaces between them. I'm reading it into a perl script that looks like below

while(<stdin>){
  @tmp = split /,/, $_;
  ... do something with @tmp elements.
}

Is there an easy regex way to read all the elements into an array rather than painfully splitting it and stripping quotes and braces?

Thanks in advance

Was it helpful?

Solution

It is simple to parse each row with a regular expression.

You don't say in what form you want to store the data, but this short program may help.

I have use Data::Dump to display the contents of the @data array after processing the file.

use strict;
use warnings;

my @data;

while (<DATA>) {
  my @fields = /"([^"]*)"/g;
  push @data, \@fields;
}

use Data::Dump;
dd \@data;

__DATA__
["ABC","D EF","XYZ"]
["MNO","P","QR  ST"]
["A"]

output

[["ABC", "D EF", "XYZ"], ["MNO", "P", "QR  ST"], ["A"]]

OTHER TIPS

may be you better off with json parser? http://search.cpan.org/dist/JSON-Parse/lib/JSON/Parse.pod

Would something like this work?

use strict;
use Data::Dumper;

my @tmp;

while(<stdin>){
      chomp;
      s/[^a-zA-Z\d\s,]//g;
      push (@tmp, split /,/, $_);
}
print Dumper (\@tmp);

output

$VAR1 = [
          'ABC',
          'D EF',
          'XYZ',
          'MNO',
          'P',
          'QR  ST',
          'A'
        ];

Edit

Alternative:

use strict;
use Data::Dumper;

my @tmp;

while(<stdin>){
      chomp;
      s/[^a-zA-Z\d\s,]//g;
      push (@tmp, [split /,/, $_]);
}
print Dumper (\@tmp);

output

$VAR1 = [
          [
            'ABC',
            'D EF',
            'XYZ'
          ],
          [
            'MNO',
            'P',
            'QR  ST'
          ],
          [
            'A'
          ]
        ];
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top