Question

I have a log in CSV format we write out for a certain logging operation. However, one of the fields allows user input and I need to make sure that if they enter a comma in the field that we parse it out and replace it with something that, say, Excel will be able to read and show a comma in its place (so the csv reader will not think it is the end of a column).

Currently I replace the comma with , but this is shows as a literal string in Excel.

Is there a standard way to display a comma in a CSV file without using the actual comma character? Even a solution that only works with excel will work, since most of our customers will be using Excel to view this file.

Was it helpful?

Solution

The best way to handle embedded commas is to properly quote the CSV file:

  • Columns that contain a comma should be quoted
  • Quoted columns that contain a quote should have the quote escaped

Example:

Joe Smith, "Joe Smith, Jr.", "Joe ""The Man"" Smith, Jr."

I wrote an extension method that helps solve this:

static public string CsvQuote(this string text)
{
    if (text == null) return string.Empty;

    bool containsQuote = false;
    bool containsComma = false;
    int len = text.Length;

    for (int i = 0; i < len && (containsComma == false || containsQuote == false); i++)
    {
        char ch = text[i];
        if (ch == '"')
        {
            containsQuote = true;
        }
        else if (ch == ',' || char.IsControl(ch))
        {
            containsComma = true;
        }
    }

    bool mustQuote = containsComma || containsQuote;

    if (containsQuote)
    {
        text = text.Replace("\"", "\"\"");
    }

    // Quote the cell and replace embedded quotes with double-quote or just return as is
    return mustQuote ? "\"" + text + "\"" : text;
}

USAGE:

logger.Write(myString.CsvQuote());

var csv = string.Join(",", listOfStrings.Select(CsvQuote))

OTHER TIPS

Including your string inside of quotation marks will let you use commas.

"please sir,", can I, have some more?

You can put a quotes around the entire field. Most CSV parsers will understand that the comma is part of the data and not the end of the field.

Or use a different separator. This will require you use the text import wizard in Excel instead of just being able to open the file directly. I typically use~ or |.

CSV is also "character separated values", not only comma.

You can use any character as a separator, but the tab or \t is widely used for this, as it typically not used in user input.

The RFC for CSV is RFC 4180

It suggests to use data fields and field separators. Here is the original text, please note the special part of Microsoft Excel in (5)

5.  Each field may or may not be enclosed in double quotes (however
   some programs, such as Microsoft Excel, do not use double quotes
   at all).  If fields are not enclosed with double quotes, then
   double quotes may not appear inside the fields.  For example:

   "aaa","bbb","ccc" CRLF
   zzz,yyy,xxx

6.  Fields containing line breaks (CRLF), double quotes, and commas
   should be enclosed in double-quotes.  For example:

   "aaa","b CRLF
   bb","ccc" CRLF
   zzz,yyy,xxx

7.  If double-quotes are used to enclose fields, then a double-quote
   appearing inside a field must be escaped by preceding it with
   another double quote.  For example:

   "aaa","b""bb","ccc"

Please also note that Excel recognizes Tab out of the Box

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top