Question

Can I somehow convert Excel .XLS file to txt-tsv (tab-separated-values) file, using C#?

Was it helpful?

Solution

You may read that XLS file easily via OleDb (ADO.NET provider) and create a StreamWriter object to write data into the Text/TSV file.

using (OleDbConnection cn = new OleDbConnection())
        {
            using (OleDbCommand cmd = new OleDbCommand())
            {
                cn.ConnectionString = @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + @"C:\path\file.xls" + ";Extended Properties=\"Excel 8.0;HDR=Yes;IMEX=1\";";
                cmd.Connection = cn;
                cmd.CommandText = "select * from [Sheet1$]";
                using (OleDbDataAdapter adp = new OleDbDataAdapter(cmd))
                {
                    DataTable dt = new DataTable();
                    adp.Fill(dt);
                    using (StreamWriter wr = new StreamWriter(@"C:\path\flie.tsv"))
                    {
                        foreach (DataRow row in dt.Rows)
                        {
                            wr.WriteLine(row[0] + "\t" + row[1]);
                        }
                    }
                }
            }
        }

OTHER TIPS

Using OleDb can be tricky and can cause issues depending on the version of excel the spreadsheet was created on. For example the above example would work with .xls, but not .xlsx. You would have to change your connection string from "Microsoft.Jet.OLEDB.4.0" to "Microsoft.ACE.OLEDB.12.0" in order to compensate. However, it would still not be universal to all excel sheets. I would use Microsoft.Office.Interop.Excel like follows

Microsoft.Office.Interop.Excel.Application myExcel;
Microsoft.Office.Interop.Excel.Workbook myWorkbook;
Microsoft.Office.Interop.Excel.Worksheet worksheet;

myExcel = new Microsoft.Office.Interop.Excel.Application();
myExcel.Workbooks.Open(inputFileName.xls, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value);
myWorkbook = myExcel.ActiveWorkbook;
worksheet = (Microsoft.Office.Interop.Excel.Worksheet)myWorkbook.Worksheets[1];
myWorkbook.SaveAs(outputFileName.txt, Microsoft.Office.Interop.Excel.XlFileFormat.xlTextWindows, Missing.Value, Missing.Value, Missing.Value, false, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange, Missing.Value, Missing.Value, Missing.Value, Missing.Value, Missing.Value);

myWorkbook.Close(false, Missing.Value, Missing.Value);
myExcel.Quit();

No looping, no bs. Just copy, paste, and change your filenames. The only problem I have seen is sometimes myExcel.Quit() doesn't seem to work correctly and instances of excel stay open I the background. The way around this is to kill the process manually by your program, but I will save that for a different discussion.

See this answer to a related question here. On the line that has ws.SaveAs(targetFilePath, XlFileFormat.xlCSV);, replace xlCSV with xlUnicodeText.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top