Question

I have a massive csv file full of data, and I need to extract only the yes/no (1-16 part) preserving the comma and breaking to a new line once it has completed the loop

Here is a snapshot of the data

Firstname: xxx
Lastname: xxx
Email: xxx@xxx.net
Phone: xxxxxxxxxx
IP Address: xxx.xxx.xxx.xxx",,,,,,,,,,,,,,,,
xxxx,Suttle,OR,United States,xxxxxx@xxx.com,xxxxxxxxxx,xxxx xxxx,"UnkNo,wn",Long Form,New,23/xxxxx,xxx.xxx.xxx.xxx,4/17/2014 13:45,4/17/2014 13:45,S3S - Survival,xxxxxx.com,4/17/2014 0:00,4/17/2014 13:45,"  
1.  No,
2.   No,
3.  No,
4.  No,
5.  No,
6.  No,
7.  No,
8.  No,
9. No,
10.  No,
11.  No,
12. No,
13.  No,
14.  No,
15.  No,
16.  Yes,

I have tried extracting the yes/no data above using every method I can think of, and I still cant extract correctly! Any suggestions, gladly appreciated

Desired output is in a CSV file looking like this http://pastebin.com/LerQ9vE4

Was it helpful?

Solution

This should work:

use strict;
use warnings;
use autodie;

open my $fh, "<", "csvfile"; 
open my $op, ">", "output.txt";

my $flag;

while(<$fh>) {
    if ($_ =~ /\d+\.\s*(\w+,)/) {
        print $op "\n" if ($flag eq "Y");
        $flag = "N";
        print $op "$1";
    } else {
        $flag = "Y";
    }
}
  • Using a while loop read one line at a time.
  • Test the current line against a regex which states:
    • Look for a line that starts with one or more digit ^\d+
    • Followed by a literal dot .
    • Followed by zero of more spaces \s*.
    • Using a captured group for (\w+,) to capture any word followed by a ,.
  • Print the line if it matches the regex.

OTHER TIPS

open my $fin, "your.file";
my $str;
for (<$fin>) {                # read line into $_ variable
  if (/^\s*\d+\.\s*(\w+)/) {  # extract word from $_
    $str .= $1 . ",";         # append word and comma to $str
  }
}
print $str;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top