fgetcsv() ignores special characters when they are at the beginning of line!

https://stackoverflow.com/questions/2238971

19-09-2019
|

Question

I have a simple script that accepts a CSV file and reads every row into an array. I then cycle through each column of the first row (in my case it holds the questions of a survey) and I print them out. The survey is in french and whenever the first character of a question is a special character (é,ê,ç, etc) fgetcsv simply omits it.

Special characters in the middle of the value are not affected only when they are the first character.

I tried to debug this but I am baffled. I did a var_dump with the content of the file and the characters are definitely there:

var_dump(utf8_encode(file_get_contents($_FILES['csv_file']['tmp_name'])));

And here's my code:

if(file_exists($_FILES['csv_file']['tmp_name']) && $csv = fopen($_FILES['csv_file']['tmp_name'], "r"))
    {
        $csv_arr = array();

        //Populate an array with all the cells of the CSV file
        while(!feof($csv))
        {
            $csv_arr[] = fgetcsv($csv);
        }

        //Close the file, no longer needed
        fclose($csv);

        // This should cycle through the cells of the first row (questions)
        foreach($csv_arr[0] as $question)
        {
            echo utf8_encode($question) . "<br />";
        }

    }

Solution

Have you already checked out the manual page on fgetcsv? There is nothing talking about that specific problem offhand, but a number of contributions maybe worth looking through if nothing comes up here.

There's this, for example:

Note: Locale setting is taken into account by this function. If LANG is e.g. en_US.UTF-8, files in one-byte encoding are read wrong by this function.

Also, seeing as it's always in the beginning of the line, could it be that this is really a hidden line break problem? There's this:

Note: If PHP is not properly recognizing the line endings when reading files either on or created by a Macintosh computer, enabling the auto_detect_line_endings run-time configuration option may help resolve the problem.

You may also want to try saving the file with different line endings.

OTHER TIPS

Are you setting your locale correctly before calling fgetcsv()?

setlocale(LC_ALL, 'fr_FR.UTF-8');

Otherwise, fgetcsv() is not multi-byte safe.

Make sure that you set it to something that appears in your list of available locales. On linux (certainly on debian) you can see this by doing

locale -a

You should get something like...

C
en_US.utf8
POSIX

For UTF8 support pick an encoding with utf8 on the end. If your input is encoded with something else you'll need to use the appropriate locale - but make sure your OS supports it first.

If you set the locale to a locale which isn't available on your system it won't help you.

This behaviour has a bug report filed for it, but apparently it isn't a bug.

We saw the same result with LANG set to C, and worked around it by ensuring that such values were wrapped in quotation marks. For example, the line

a,"a",é,"é",óú,"óú",ó&ú,"ó&ú"

generates the following array when passed through fgetcsv():

array (
  0 => 'a',
  1 => 'a',
  2 => '',
  3 => 'é',
  4 => '',
  5 => 'óú',
  6 => '&ú',
  7 => 'ó&ú',
)

Of course, you'll have to escape any quotation marks in the value by doubling them, but that's much less hassle than repairing the missing characters.

Oddly, this happens with both UTF-8 and cp1252 encodings for the input file.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow