Perl parsing multiple separator character data

Question

If your columns are separated by multiple spaces, Text::CSV is useless. Your code contains a lot of repeated code, trying to work around of Text::CSV limitations.

Also, your code has bad style, contains multiple syntax errors and typos, and confused variable names.

So You Want To Parse A Header.

We need a definition of the header line for our code. Let's take “the first comment line that contains non-space characters”. It may not be preceded by non-comment lines.

use strict; use warnings; use autodie;

open my $fh, '<:encoding(UTF-8)', "filename.tsv";  # error handling by autodie

my @headers;
while (<$fh>) {
  # no need to copy to a $line variable, the $_ is just fine.
  chomp;                                     # remove line ending
  s/\A#\s*// or die "No header line found";  # remove comment char, or die
  /\S/ or next;                              # skip if there is nothing here
  @headers = split;                          # split the header names.
                                             # The `split` defaults to `split /\s+/, $_`
  last;                                      # break out of the loop: the header was found
}

The \s character class matches space characters (spaces, tabs, newlines, etc.). The \S is the inverse and matches all non-space characters.

The Rest

Now we have our header names, and can proceed to normal parsing:

my @records;
while (<$fh>) {
  chomp;
  next if /\A#/;              # skip comments
  my @fields = split;
  my %hash;
  @hash{@headers} = @fields;  # use hash slice to assign fields to headers
  push @records, \%hash;      # add this hashref to our records
}

Voilà.

The Result

This code produces the following data structure from your example data:

@records = (
  {
    address => "0x1234fde0",
    name    => "test.data.one",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fde4",
    name    => "test.data.two",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fde8",
    name    => "test.data.the",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fdec",
    name    => "test.data.for",
    scale   => 32768,
    type    => "float",
  },
  {
    address => "0x1234fdf0",
    name    => "test.data.fiv",
    scale   => 32768,
    type    => "float",
  },
);

This data structure could be used like

for my $record (@records) {
  say $record->{name};
}

or

for my $i (0 .. $#records) {
  say "$i: $records[$i]{name}";
}

Criticism Of Your Code

You declare all your variables at the top of your script, effectively making them global variables. Don't. Create your variables in the smallest scope possible. My code uses just three variables in the outer scope: $fh, @headers and @records.
This line my $csv=Text::CSV({sep_char = ","}) doesn't work as expected.
- Text::CSV is not a function; it is the name of a module. You meant Text::CSV->new(...).
- The options should be a hashref, but sep_char = "," tries to assign something to sep_char sadly, this could be valid syntax. But you actually meant to specify a key-value relationship. Use the => operator instead (called fat comma or hash rocket).
Neither does this work: or die "Text::CSV error: " Text::CSV=error_diag.
- To concatenate strings, use the . concatenation operator. What you wrote is a syntax error: A literal string is always followed by an operator.
- You really like assignments? The Text::CSV=error_diag does not work. You intended to call the error_diag method on the Text::CSV class. Therefore, use the correct operator ->: Text::CSV->error_diag.
The substitution s/t+/,/g replaces all sequences of ts by commas. To replace tabs, use the \t charclass.
%arrayofhashes is not an array of hashes: It is a hash (as evidenced by the % sigil), but you use integer numbers as keys. Arrays have the @ sigil.
To add something to the end of an array, I'd rather not keep the index of the last item in an extra variable. Rather, use the push function to add an item to the end. This reduces the amount of bookkeeping code.
if you find yourself writing a loop like my $i = 0; while (condition) { do stuff; $i++}, then you usually want to have a C-style for loop:
```
for (my $i = 0; condition; $i++) {
  do stuff;
}
```
This also helps with proper scoping of variables.