Question

This is what I've got. It works. But, is there a simpler or better way?

One an ASPX page, I've got the download link...

<asp:HyperLink ID="HyperLinkDownload" runat="server" NavigateUrl="~/Download.aspx">Download as CSV file</asp:HyperLink>

And then I've got the Download.aspx.vb Code Behind...

Public Partial Class Download
    Inherits System.Web.UI.Page

    Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
        'set header
        Response.Clear()
        Response.ContentType = "text/csv"
        Dim FileName As String = "books.csv"
        Response.AppendHeader("Content-Disposition", "attachment;filename=" + FileName)

        'generate file content
        Dim db As New bookDevelopmentDataContext
        Dim Allbooks = From b In db.books _
                       Order By b.Added _
                       Select b
        Dim CsvFile As New StringBuilder
        CsvFile.AppendLine(CsvHeader())
        For Each b As Book In Allbooks
            CsvFile.AppendLine(bookString(b))
        Next

        'write the file
        Response.Write(CsvFile.ToString)
        Response.End()
    End Sub

    Function CsvHeader() As String
        Dim CsvLine As New StringBuilder
        CsvLine.Append("Published,")
        CsvLine.Append("Title,")
        CsvLine.Append("Author,")
        CsvLine.Append("Price")
        Return CsvLine.ToString
    End Function

    Function bookString(ByVal b As Book) As String
        Dim CsvLine As New StringBuilder
        CsvLine.Append(b.Published.ToShortDateString + ",")
        CsvLine.Append(b.Title.Replace(",", "") + ",")
        CsvLine.Append(b.Author.Replace(",", "") + ",")
        CsvLine.Append(Format(b.Price, "c").Replace(",", ""))
        Return CsvLine.ToString
    End Function

End Class
Was it helpful?

Solution

CSV formatting has some gotchas. Have you asked yourself these questions:

  • Does any of my data have embedded commas?
  • Does any of my data have embedded double-quotes?
  • Does any of my data have have newlines?
  • Do I need to support Unicode strings?

I see several problems in your code above. The comma thing first of all... you are stripping commas:

CsvLine.Append(Format(b.Price, "c").Replace(",", ""))

Why? In CSV, you should be surrounding anything which has commas with quotes:

CsvLine.Append(String.Format("\"{0:c}\"", b.Price))

(or something like that... my VB is not very good). If you're not sure if there are commas, but put quotes around it. If there are quotes in the string, you need to escape them by doubling them. " becomes "".

b.Title.Replace("\"", "\"\"")

Then surround this by quotes if you want. If there are newlines in your string, you need to surround the string with quotes... yes, literal newlines are allowed in CSV files. It looks weird to humans, but it's all good.

A good CSV writer requires some thought. A good CSV reader (parser) is just plain hard (and no, regex not good enough for parsing CSV... it will only get you about 95% of the way there).

And then there is Unicode... or more generally I18N (Internationalization) issues. For example, you are stripping commas out of a formatted price. But that's assuming the price is formatted as you expect it in the US. In France, the number formatting is reversed (periods used instead of commas, and vice versa). Bottom line, use culture-agnostic formatting wherever possible.

While the issue here is generating CSV, inevitably you will need to parse CSV. In .NET, the best parser I have found (for free) is Fast CSV Reader on CodeProject. I've actually used it in production code and it is really really fast, and very easy to use!

OTHER TIPS

I pass all my CSV data through a function like this:

Function PrepForCSV(ByVal value As String) As String
    return String.Format("""{0}""", Value.Replace("""", """"""))
End Function

Also, if you're not serving up html you probably want an http handler (.ashx file) rather than a full web page. If you create a new handler in Visual Studio, odds are you could just copy past your existing code into the main method and it will just work, with a small performance boost for your efforts.

You can create the equivalent of bookString() in the query itself. Here is what I think would be a simpler way.

protected void Page_Load(object sender, EventArgs e)
{
    using (var db = new bookDevelopmentDataContext())
    {
        string fileName = "book.csv";
        var q = from b in db.books
                select string.Format("{0:d},\"{1}\",\"{2}\",{3:F2}", b.Published, b.Title.Replace("\"", "\"\""), b.Author.Replace("\"", "\"\""), t.price);

        string outstring = string.Join(",", q.ToArray());

        Response.Clear();
        Response.ClearHeaders();
        Response.ContentType = "text/csv";
        Response.AppendHeader("Content-Disposition", string.Format("attachment;filename={0}", fileName));
        Response.Write("Published,Title,Author,Price," + outstring);
        Response.End();
    }
}

If you want a colon delimited value converter then there is a 3rd party open source called FileHelpers. I'm not sure about what open-source license it is under, but it has helped me quite a lot.

There's a lot of overhead associated with the Page class. Since you're just spitting out a CSV file and have no need for postback, server controls, caching, or the rest of it, you should make this into a handler with an .ashx extension. See here.

In addition to what Simon said, you may want to read the CSV how-to guide and make sure your output doesn't run across any of the gotchas.

To clarify something Simon said:

Then surround this by quotes if you want

Fields that contain doubled up double quotes ("") will need to be completely surrounded with double quotes. There shouldn't be any harm in just wrapping all fields with double quotes, unless you specifically want the parser to strip out leading and trailing whitespace (instead of trimming it yourself).

I use the following method when building a CSV file from a DataTable. ControllerContext is just the reponse stream object where the file is written to. For you it is just going to be the Response object.

public override void ExecuteResult(ControllerContext context)
        {
            StringBuilder csv = new StringBuilder(10 * Table.Rows.Count * Table.Columns.Count);

            for (int c = 0; c < Table.Columns.Count; c++)
            {
                if (c > 0)
                    csv.Append(",");
                DataColumn dc = Table.Columns[c];
                string columnTitleCleaned = CleanCSVString(dc.ColumnName);
                csv.Append(columnTitleCleaned);
            }
            csv.Append(Environment.NewLine);
            foreach (DataRow dr in Table.Rows)
            {
                StringBuilder csvRow = new StringBuilder();
                for(int c = 0; c < Table.Columns.Count; c++)
                {
                    if(c != 0)
                        csvRow.Append(",");

                    object columnValue = dr[c];
                    if (columnValue == null)
                        csvRow.Append("");
                    else
                    {
                        string columnStringValue = columnValue.ToString();


                        string cleanedColumnValue = CleanCSVString(columnStringValue);

                        if (columnValue.GetType() == typeof(string) && !columnStringValue.Contains(","))
                        {
                            cleanedColumnValue = "=" + cleanedColumnValue; // Prevents a number stored in a string from being shown as 8888E+24 in Excel. Example use is the AccountNum field in CI that looks like a number but is really a string.
                        }
                        csvRow.Append(cleanedColumnValue);
                    }
                }
                csv.AppendLine(csvRow.ToString());
            }

            HttpResponseBase response = context.HttpContext.Response;
            response.ContentType = "text/csv";
            response.AppendHeader("Content-Disposition", "attachment;filename=" + this.FileName);
            response.Write(csv.ToString());
        }

        protected string CleanCSVString(string input)
        {
            string output = "\"" + input.Replace("\"", "\"\"").Replace("\r\n", " ").Replace("\r", " ").Replace("\n", "") + "\"";
            return output;
        }

Looking mostly good except in your function "BookString()" you should pass all those strings through a small function like this first:

Private Function formatForCSV(stringToProcess As String) As String
    If stringToProcess.Contains("""") Or stringToProcess.Contains(",") Then
        stringToProcess = String.Format("""{0}""", stringToProcess.Replace("""", """"""))
    End If
    Return stringToProcess
End Function

'So, lines like this:
CsvLine.Append(b.Title.Replace(",", "") + ",")
'would be lines like this instead:
CsvLine.Append(formatForCSV(b.Title)) + ",")

The function will format your strings well for CSV. It replaces quotes with double quotes and add quotes around the string if there are either quotes or commas in the string.

Note that it doesn't account for newlines, but can only safely guarantee good CSV output for those strings that you know are free of newlines (inputs from simple one-line text forms, etc.).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top