How do I best generate a CSV (comma-delimited text file) for download with ASP.NET?
Question
This is what I've got. It works. But, is there a simpler or better way?
One an ASPX page, I've got the download link...
<asp:HyperLink ID="HyperLinkDownload" runat="server" NavigateUrl="~/Download.aspx">Download as CSV file</asp:HyperLink>
And then I've got the Download.aspx.vb Code Behind...
Public Partial Class Download
Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) Handles Me.Load
'set header
Response.Clear()
Response.ContentType = "text/csv"
Dim FileName As String = "books.csv"
Response.AppendHeader("Content-Disposition", "attachment;filename=" + FileName)
'generate file content
Dim db As New bookDevelopmentDataContext
Dim Allbooks = From b In db.books _
Order By b.Added _
Select b
Dim CsvFile As New StringBuilder
CsvFile.AppendLine(CsvHeader())
For Each b As Book In Allbooks
CsvFile.AppendLine(bookString(b))
Next
'write the file
Response.Write(CsvFile.ToString)
Response.End()
End Sub
Function CsvHeader() As String
Dim CsvLine As New StringBuilder
CsvLine.Append("Published,")
CsvLine.Append("Title,")
CsvLine.Append("Author,")
CsvLine.Append("Price")
Return CsvLine.ToString
End Function
Function bookString(ByVal b As Book) As String
Dim CsvLine As New StringBuilder
CsvLine.Append(b.Published.ToShortDateString + ",")
CsvLine.Append(b.Title.Replace(",", "") + ",")
CsvLine.Append(b.Author.Replace(",", "") + ",")
CsvLine.Append(Format(b.Price, "c").Replace(",", ""))
Return CsvLine.ToString
End Function
End Class
Solution
CSV formatting has some gotchas. Have you asked yourself these questions:
- Does any of my data have embedded commas?
- Does any of my data have embedded double-quotes?
- Does any of my data have have newlines?
- Do I need to support Unicode strings?
I see several problems in your code above. The comma thing first of all... you are stripping commas:
CsvLine.Append(Format(b.Price, "c").Replace(",", ""))
Why? In CSV, you should be surrounding anything which has commas with quotes:
CsvLine.Append(String.Format("\"{0:c}\"", b.Price))
(or something like that... my VB is not very good). If you're not sure if there are commas, but put quotes around it. If there are quotes in the string, you need to escape them by doubling them. "
becomes ""
.
b.Title.Replace("\"", "\"\"")
Then surround this by quotes if you want. If there are newlines in your string, you need to surround the string with quotes... yes, literal newlines are allowed in CSV files. It looks weird to humans, but it's all good.
A good CSV writer requires some thought. A good CSV reader (parser) is just plain hard (and no, regex not good enough for parsing CSV... it will only get you about 95% of the way there).
And then there is Unicode... or more generally I18N (Internationalization) issues. For example, you are stripping commas out of a formatted price. But that's assuming the price is formatted as you expect it in the US. In France, the number formatting is reversed (periods used instead of commas, and vice versa). Bottom line, use culture-agnostic formatting wherever possible.
While the issue here is generating CSV, inevitably you will need to parse CSV. In .NET, the best parser I have found (for free) is Fast CSV Reader on CodeProject. I've actually used it in production code and it is really really fast, and very easy to use!
OTHER TIPS
I pass all my CSV data through a function like this:
Function PrepForCSV(ByVal value As String) As String
return String.Format("""{0}""", Value.Replace("""", """"""))
End Function
Also, if you're not serving up html you probably want an http handler (.ashx file) rather than a full web page. If you create a new handler in Visual Studio, odds are you could just copy past your existing code into the main method and it will just work, with a small performance boost for your efforts.
You can create the equivalent of bookString() in the query itself. Here is what I think would be a simpler way.
protected void Page_Load(object sender, EventArgs e)
{
using (var db = new bookDevelopmentDataContext())
{
string fileName = "book.csv";
var q = from b in db.books
select string.Format("{0:d},\"{1}\",\"{2}\",{3:F2}", b.Published, b.Title.Replace("\"", "\"\""), b.Author.Replace("\"", "\"\""), t.price);
string outstring = string.Join(",", q.ToArray());
Response.Clear();
Response.ClearHeaders();
Response.ContentType = "text/csv";
Response.AppendHeader("Content-Disposition", string.Format("attachment;filename={0}", fileName));
Response.Write("Published,Title,Author,Price," + outstring);
Response.End();
}
}
If you want a colon delimited value converter then there is a 3rd party open source called FileHelpers. I'm not sure about what open-source license it is under, but it has helped me quite a lot.
There's a lot of overhead associated with the Page class. Since you're just spitting out a CSV file and have no need for postback, server controls, caching, or the rest of it, you should make this into a handler with an .ashx extension. See here.
In addition to what Simon said, you may want to read the CSV how-to guide and make sure your output doesn't run across any of the gotchas.
To clarify something Simon said:
Then surround this by quotes if you want
Fields that contain doubled up double quotes ("") will need to be completely surrounded with double quotes. There shouldn't be any harm in just wrapping all fields with double quotes, unless you specifically want the parser to strip out leading and trailing whitespace (instead of trimming it yourself).
I use the following method when building a CSV file from a DataTable. ControllerContext is just the reponse stream object where the file is written to. For you it is just going to be the Response object.
public override void ExecuteResult(ControllerContext context)
{
StringBuilder csv = new StringBuilder(10 * Table.Rows.Count * Table.Columns.Count);
for (int c = 0; c < Table.Columns.Count; c++)
{
if (c > 0)
csv.Append(",");
DataColumn dc = Table.Columns[c];
string columnTitleCleaned = CleanCSVString(dc.ColumnName);
csv.Append(columnTitleCleaned);
}
csv.Append(Environment.NewLine);
foreach (DataRow dr in Table.Rows)
{
StringBuilder csvRow = new StringBuilder();
for(int c = 0; c < Table.Columns.Count; c++)
{
if(c != 0)
csvRow.Append(",");
object columnValue = dr[c];
if (columnValue == null)
csvRow.Append("");
else
{
string columnStringValue = columnValue.ToString();
string cleanedColumnValue = CleanCSVString(columnStringValue);
if (columnValue.GetType() == typeof(string) && !columnStringValue.Contains(","))
{
cleanedColumnValue = "=" + cleanedColumnValue; // Prevents a number stored in a string from being shown as 8888E+24 in Excel. Example use is the AccountNum field in CI that looks like a number but is really a string.
}
csvRow.Append(cleanedColumnValue);
}
}
csv.AppendLine(csvRow.ToString());
}
HttpResponseBase response = context.HttpContext.Response;
response.ContentType = "text/csv";
response.AppendHeader("Content-Disposition", "attachment;filename=" + this.FileName);
response.Write(csv.ToString());
}
protected string CleanCSVString(string input)
{
string output = "\"" + input.Replace("\"", "\"\"").Replace("\r\n", " ").Replace("\r", " ").Replace("\n", "") + "\"";
return output;
}
Looking mostly good except in your function "BookString()" you should pass all those strings through a small function like this first:
Private Function formatForCSV(stringToProcess As String) As String
If stringToProcess.Contains("""") Or stringToProcess.Contains(",") Then
stringToProcess = String.Format("""{0}""", stringToProcess.Replace("""", """"""))
End If
Return stringToProcess
End Function
'So, lines like this:
CsvLine.Append(b.Title.Replace(",", "") + ",")
'would be lines like this instead:
CsvLine.Append(formatForCSV(b.Title)) + ",")
The function will format your strings well for CSV. It replaces quotes with double quotes and add quotes around the string if there are either quotes or commas in the string.
Note that it doesn't account for newlines, but can only safely guarantee good CSV output for those strings that you know are free of newlines (inputs from simple one-line text forms, etc.).