Question

I am trying to process a text file in perl. I need to store the data from the file into a database. The problem that I'm having is that some fields contain a newline, which throws me off a bit. What would be the best way to contain these fields?

Example data.txt file:

ID|Title|Description|Date
1|Example 1|Example Description|10/11/2011
2|Example 2|A long example description
Which contains
a bunch of newlines|10/12/2011
3|Example 3|Short description|10/13/2011

The current (broken) Perl script (example):

#!/usr/bin/perl -w
use strict;

open (MYFILE, 'data.txt');
while (<MYFILE>) {
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

As you can see from the example, in the case of ID 2, it's broken into several lines causing errors when attempting to reference those undefined variables. How would you group them into the correct field?

Thanks in advance! (I hope the question was clear enough, difficult to define the title)

Was it helpful?

Solution

I would just count the number of separators before splitting the line. If you don't have enough, read the next line and append it. The tr operator is an efficient way to count characters.

#!/usr/bin/perl -w
use strict;
use warnings;

open (MYFILE, '<', 'data.txt');
while (<MYFILE>) {
    # Continue reading while line incomplete:
    while (tr/|// < 3) {
        my $next = <MYFILE>;
        die "Incomplete line at end" unless defined $next;
        $_ .= $next;
    }

    # Remaining code unchanged:
    chomp;
    my ($id, $title, $description, $date) = split(/\|/);

    if ($id ne 'ID') {
        # processing certain fields (...)

        # insert into the database (example)
        $sqlInsert->execute($id, $title, $description, $date);
    }
}
close (MYFILE);

OTHER TIPS

Read next line until number of fields is what you need. Something like that (I haven't tested that code):

my @fields = split(/\|/);
unless ($#fields == 3) { # Repeat untill we get 4 fields in array

  <MYFILE>; # Read next line      
  chomp;

  # Split line
  my @add_fields = split(/\|/); 

  # Concatenate last element of first line with first element of the current line
  $fields[$#fields] = $fields[$#fields] . $add_fields[0]; 

  # Concatenate remaining array part
  push(@fields, @add_fields[1,$#add_fields]);

}

If you could change your data.txt file to include the pipe separator as the last character in every line/record, you could slurp in the whole file, splitting directly into the raw fields. This code would then do what you want:

#!/usr/bin/perl
use strict;
use warnings;

my @fields;
{
  $/ = "|";
  open (MYFILE, 'C:/data.txt') or die "$!";
  @fields = <MYFILE>;
  close (MYFILE);

  for(my $i = 0; $i < scalar(@fields); $i = $i + 4) {
    my $id = $fields[$i];
    my $title = $fields[$i+1];
    my $description = $fields[$i+2];
    my $date = $fields[$i+3];
    if ($id =~ m/^\d+$/) {
        # processing certain fields (...)

        # insert into the database (example)
    }
  }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top