Question

I have to parse a CSV file with PHP. The CSV file is provided by the client and I do not have any control over the format. It is comma delimited and uses double quotes as text qualifiers. However, if a field, such as an address field, has a comma in it, the client's system surrounds the field in an additional set of double quotes. For example:

"9999X111","X1110000110105","John Doe",""123 Central Park Avenue, #108"","New York NY 10006 ","","","M","0","1","370.20"

As you can see, the 4th field (3rd index) has an extra set of double quotation marks around the entire field. If I send this string through fgetcsv() or str_getcsv(), the field is not handled correctly.

Unwanted Result Array:
[0] => 9999X111
[1] => X1110000110105
[2] => John Doe
[3] => 555 Central Park Avenue
[4] =>  #108""
[5] => New York NY 10006

If I remove the extra set of double quotation marks manually, the line is processed correctly using either function; however, I wouldn't be able to do this in a production environment.

Preferred Result Array:
[0] => 9999X111
[1] => X1110000110105
[2] => John Doe
[3] => 555 Central Park Avenue, #108
[4] => New York NY 10006

Here is the current code I am using:

$fileCHG = fopen($fileloc['InputFile'], "r");
$cnt = 0;
while(!feof($fileCHG)) {
    $chg[$cnt] = fgetcsv($fileCHG,0,",","\"");
    if($chg[$cnt]=="") { //Unset Any Blank Arrays
        unset($chg[$cnt]);
    }
    $cnt++;
}

I have tried a variety of suggestions from all over Stack Overflow, the PHP manual and more and can't seem to get it working. Even if I manually escape the inner set of double quotation marks with a backslash, I still get the incorrect result array. No matter how I play with either function my script will mess up and try to split the field at the comma following "Avenue" and ignores the remaining "".

I feel as if this comment on the PHP site may be explaining what's happening, but as a new coder I am unable to visualize what's actually going on.

http://www.php.net/manual/en/function.fgetcsv.php#58124

I have also tried the following suggestions (out of many) to no avail.

fgetcsv is not splition data properly str_getcsv not parsing the data correctly

This method could have worked; but it requires that the number of fields on each line is the same.

Reading CSV file with unescaped enclosures

I am using PHP 5.3.27 on Mac OS X 10.8.

Thank you in advance for taking a look.

Was it helpful?

Solution

I was able to solve the problem by expanding on the comments left by Daniel and Cosades. Instead of using fgetcsv() to process the line immediately, I use fgets() to store the line in a variable ($line). Then, I used stripos() to find to find the location of every occurrence of a repeating double quote (""). Then, identifying which locations need to be edited by determining if the character before or after is not a comma (,). Below is my new code.

$fileCHG = fopen($fileloc['Charge'], "r");
$cnt = 0;

while(($line=fgets($fileCHG))!==false){
    $pos = 0;
    while($pos=stripos($line,"\"\"",$pos)){
        $chrA = substr($line,$pos-1,1);
        $chrB = substr($line,$pos+2,1);

        if($chrA!=","){
            $line   = substr_replace($line,"",$pos+1,1);
        }

        if($chrB!=","){
            $line   = substr_replace($line,"",$pos+1,1);
        }   

        $pos = $pos + strlen(",\"\"");
    }

    if($line!=""){
        $chg[$cnt] = str_getcsv($line,",","\"");
    }

    if($chg[$cnt]==""){
        unset($chg[$cnt]);
    }    

    $cnt++;
}

Thanks for pointing me in the right direction!

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top