Question

Do anyone come across a performance issue when deleting a first row in a 20,000+ rows Excel file using OpenXML SDK v2.0?

I am using the delete row coding suggested in the Open XML SDK document. It takes me several minutes just to delete the first row using Open XML SDK, but it only takes just a second in Excel applicaton.

I eventually found out that the bottle-neck is actually on the bubble-up approach in dealing with row deletion. There are many rows updating after the deleted row. So in my case, there are around 20,000 rows to be updated, shifting up the data row by row.

I wonder if there is any faster way to do the row deletion.

Do anybody have an idea?

Was it helpful?

Solution

Well, the bad news here is: yep, that's the way it is.

You may get slightly better performance moving outside of the SDK itself into System.IO.Packaging and just creating an IEnumerable/List in like Linq-to-XML of all the rows, copy that to a new IEnumerable/List without the first row, rewrite the r attribute of <row r="?"/> to be it's place in the index, and the write that back inside <sheetData/> over existing children.

You'd need to kind of do the same for any strings in the sharedStrings.xml file - i.e. removing the <ssi>.<si> elements that were in the row that was deleted, but in this case they are now implicitly indexed, so you'd be able to get away with just outright removing them.

OTHER TIPS

The approach of unzipping the file, manipulating it and repacking it is very error-prune.

How about this: If you say, that it works fine in Excel: Have you tried to use the Interop? This starts a new instance of Excel (either visible or invisible), then you can open the File, delete the line, save and close the application again.

using System;
using System.IO;
using Microsoft.Office.Interop.Excel;
using Excel = Microsoft.Office.Interop.Excel;
public void OpenAndCloseExcel() 
{
    Excel.Application excelApp = new Excel.Application();
    // Open Workbook, open Worksheet, delete line, Save
    excelApp.Quit();
}

The Range-object is qualified for many purposes. Also for deleting elements. Have a look at: MSDN Range-Description. One more hint: Interop uses Excel, so all Objects have to be adressed with a 1-based index! For more resources take a look at this StackOverflow-thread.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top