Question

I'm using str_getcsv to parse tab separated values being returned from a nosql query however I'm running into a problem and the only solution I've found is illogical.

Here's some sample code to demonstrate (FYI, it seems the tabs aren't being preserved when showing here)...

$data = '0  16  Gruesome Public Executions In North Korea - 80 Killed       http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata        "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...    1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7';

$data = str_getcsv($data,"\t",NULL);

echo '<pre>'.print_r($data,TRUE).'</pre>';

Pay particular attention to the fact that one column (beginning with "North Korea...." actually starts with a double quote " but doesn't finish with one. This is why I supply NULL as the third parameter (enclosure) to override the defaut " enclosure value.

Here is the result:

Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] => 
[5] => North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...  1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7
)

As you can see the quote is breaking the function. Logically I thought I would be able to use NULL or and empty string'' as the third parameter for str_getcsv (enclosure) but neither worked?!?!

The only thing I could use to get str_getcsv to work properly was a space char ' '. That doesn't make any sense to me becuase none of the columns have whitespace starting and/or ending them.

$data = '0  16  Gruesome Public Executions In North Korea - 80 Killed       http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata        "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...    1384357511  http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw   0   The Young Turks                 1   2013-11-13 12:53:31 9ab8f5607183ed258f4f98bb80f947b4    35afc4001e1a50fb463dac32de1d19e7';

$data = str_getcsv($data,"\t",' ');

echo '<pre>'.print_r($data,TRUE).'</pre>';

Now the result is:

Array
(
[0] => 0
[1] => 16
[2] => Gruesome Public Executions In North Korea - 80 Killed
[3] => http://www.youtube.com/watch?v=Dtx30AQpcjw&feature=youtube_gdata
[4] => 
[5] => "North Korea staged gruesome public executions of 80 people this month, some for offenses as minor as watching South Korean entertainment videos or being fou...
[6] => 1384357511
[7] => http://gdata.youtube.com/feeds/api/videos/Dtx30AQpcjw
[8] => 0
[9] => The Young Turks
[10] => 
[11] => 
[12] => 
[13] => 
[14] => 1
[15] => 2013-11-13 12:53:31
[16] => 9ab8f5607183ed258f4f98bb80f947b4
[17] => 35afc4001e1a50fb463dac32de1d19e7
)

So my question is, why does it work with a space as the enclosure, but not NULL or and empty string? Also are there repercussions to this?

UPDATE 1: It seems this reduced the number of errors I was receiving in our logs but it didn't eliminate them, so I'm guessing that the I used as the enclosure has caused unintended side effects, albeit less troubling than the previous problem. But my question remains the same, why can't I use NULL, or an empty space as the enclosure, and secondly, is there a better way of dealing with / doing this?

Was it helpful?

Solution

Just to give a starting point ...

You might wanna consider working with the string itself, instead of using a function like str_getcsv in your case.

But be aware that there are at least some pitfalls, if you choose this route (might be your only option though):

  • Handling of escaped characters
  • Line breaks within the data (not meant as delimiters)

If you know that you don't have any other TABS in your string other than those ending the fields, and you don't have any linebreaks other than those delimiting a row, you might be fine with this:

$data = explode("\n", $the_whole_csv_string_block);

foreach ($data as $line)
{
    $arr = explode("\t", $line);

    // $arr[0] will have every first field of every row, $arr[1] the 2nd, ...
    // Usually this is what I want when working with a csv file

    // But if you rather want a multidimensional array, you can simply add 
    // $arr to a different array and after this loop you are good to go.
}

Otherwise this is just a starting point for you, to begin and tweak it to your individual situation, hope it helps.

OTHER TIPS

Simply use chr(0) as enclosure and escape:

$data = str_getcsv($data, "\t", chr(0), chr(0));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top