How to deal with a comma in description in a CSV file

Question 1

The most common solution is quoting fields that can contain "bad characters".

In this case:

3456,"Bad Part,with a comma",4.56

And in turn, if you happen to have " character inside you escape it with \ (and so you do with plain ).

Question 2

So, you have something that vaguely resembles a CSV file, but isn't. One thing you can do is close the gap and then process it normally -- everyone else has suggested ways of doing this. Another thing you can do is shrug and process it as it is, as something other than CSV.

Here, we have an ID at the beginning of the line, followed by a comma.

/^(\d+),/;

And then anything at all, followed by a comma:

/^(\d+),(.+),/

And then a price, followed by the end of the line:

/^(\d+),(.+),(\d+(?:\.\d+)?)$/

And yes, that (.+), in the middle works as you want with embedded commas. + is greedy, so this backtracks from right-to-left to find the first point that allows the rest of the pattern to match.

Altogether:

#! /usr/bin/env perl
use common::sense;

while (<DATA>) {
  next unless /^(\d+),(.+),(\d+(?:\.\d+)?)$/;
  say "ID: $1";
  say "Description: $2";
  say "Price: $3";
  say "----"
}

__DATA__
ID,Description,Price
1234,Good Part,1.23
2345,This is.ok,2.34
3456,Bad Part,with a comma,4.56

And, a bit neater (although the names are longer than what they name...):

#! /usr/bin/env perl
use common::sense;

while (chomp($_ = <DATA>)) {
  next if /
    ^ID,Description,Price\z  # allow only this header
    | ^\s*\z                 # and blank lines
    | ^\s*\#                 # and lines containing only a comment
  /xi;

  /^(?<ID> \d+),
    (?<Description> .+),
    (?<Price> \d+(?:\.\d+)?)
  \z/x or die "Invalid line: $_";

  say "$_: $+{$_}" for qw(ID Description Price);
  say "----";
}

__DATA__
ID,Description,Price
1234,Good Part,1.23
2345,This is.ok,2.34

# why do we allow this again?
id,description,price
3456,Bad Part,with a comma,4.56

Both output:

ID: 1234
Description: Good Part
Price: 1.23
----
ID: 2345
Description: This is.ok
Price: 2.34
----
ID: 3456
Description: Bad Part,with a comma
Price: 4.56
----

Yeah, you would need to change this regex to suit slightly different notCSV, but so would you also need to change your gap-closer. This is why notCSV is bad.

Question 3

Based on your comment in depesz's answer, here is my effort to try to surround that field between double quotes. Then just use Text::CSV_XS or similar to parse it.

Content of script.pl:

#!/usr/bin/env perl

use warnings;
use strict;

my ($f, $num_fields_h);

while ( <> ) { 
    chomp;

    ## Header:
    ## Get the position of the "Description" field and the total number
    ## of fields. I assume that header doesn't have the problem of commas
    ## in the middle.
    if ( $. == 1 ) { 
        my %h = do { my $i = 0; map { $_ => $i++ } split /,/ };
        $f = $h{ Description };
        $num_fields_h = (tr/,/,/) + 1;
        printf qq|%s\n|, $_; 
        next;
    }   

    ## Data lines:
    ## Split the line and join fields in three parts, the first one until the
    ## "Description" calculated in header. The second one from that position until
    ## the difference of fields between the header and this line. That number will
    ## be the number of commas in the description. The third one from that calculated
    ## position until the end.
    my @f = split /,/; 
    my $num_fields_d = (tr/,/,/) + 1;
    my $limit_description_field = $f + $num_fields_d - $num_fields_h;
    printf qq|%s\n|, 
        join q|,|, 
            @f[ 0 .. $f - 1 ],  
            q|"| . join( q|,|, @f[ $f .. $limit_description_field ] ) . q|"|, 
            @f[ ($limit_description_field + 1) .. $#f ];  
}

Run it like:

perl script.pl infile

That yields:

ID,Description,Price
1234,"Good Part",1.23
2345,"This is.ok",2.34
3456,"Bad Part,with a comma",4.56

Question 4

how about this :

 $x='3456,Bad Part,with a comma,4.56';
 @y = split(/,/,$x);
 if ( $#y == 3 ) { 
    $desc = "$y[1],$y[2]";
 };

Question 5

If you know how many fields there are, and trust all but one of them, then you could parse the good parts from both ends, and whatever is left would be the bad field; i.e.

while(<>){
 m/(^[^,]+),(.+),([^,]+$)/;
 my @fields = ($1,$2,$3);
 $fields[1]=~s/,/-/g;
}

So the anchored parts at the beginning at the end won't contain a comma, but a middle field in between them can.