open xml reading from excel file

Question 1

Your approach seemed to work ok for me - in that it did "enter the loop". Nevertheless you could also try something like the following:

void Main()
{
    string fileName = @"c:\path\to\my\file.xlsx";

    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
    {
        using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false))
        {
            WorkbookPart workbookPart = doc.WorkbookPart;
            SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
            SharedStringTable sst = sstpart.SharedStringTable;
        
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
            Worksheet sheet = worksheetPart.Worksheet;
            
            var cells = sheet.Descendants<Cell>();
            var rows = sheet.Descendants<Row>();
        
            Console.WriteLine("Row count = {0}", rows.LongCount());
            Console.WriteLine("Cell count = {0}", cells.LongCount());
        
            // One way: go through each cell in the sheet
            foreach (Cell cell in cells)
            {
                if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
                {
                    int ssid = int.Parse(cell.CellValue.Text);
                    string str = sst.ChildElements[ssid].InnerText;
                    Console.WriteLine("Shared string {0}: {1}", ssid, str);
                }
                else if (cell.CellValue != null)
                {
                    Console.WriteLine("Cell contents: {0}", cell.CellValue.Text);
                }
             }
        
             // Or... via each row
             foreach (Row row in rows)
             {
                 foreach (Cell c in row.Elements<Cell>())
                 {
                     if ((c.DataType != null) && (c.DataType == CellValues.SharedString))
                     {
                         int ssid = int.Parse(c.CellValue.Text);
                         string str = sst.ChildElements[ssid].InnerText;
                         Console.WriteLine("Shared string {0}: {1}", ssid, str);
                     }
                     else if (c.CellValue != null)
                     {
                         Console.WriteLine("Cell contents: {0}", c.CellValue.Text);
                     }
                 }
             }
         }
     }
 }

I used the filestream approach to open the workbook because this allows you to open it with shared access - so that you can have the workbook open in Excel at the same time. The Spreadsheet.Open(... method won't work if the workbook is open elsewhere.

Perhaps that is why your code didn't work.

Note, also, the use of the SharedStringTable to get the cell text where appropriate.

EDIT 2018-07-11:

Since this post is still getting votes I should also point out that in many cases it may be a lot easier to use ClosedXML to manipulate/read/edit your workbooks. The documentation examples are pretty user friendly and the coding is, in my limited experience, much more straight forward. Just be aware that it does not (yet) implement all the Excel functions (for example INDEX and MATCH) which may or may not be an issue. [Not that I would want to be trying to deal with INDEX and MATCH in OpenXML anyway.]

Question 2

I had the same issue as the OP, and the answer above did not work for me.

I think this is the issue: when you create a document in Excel (not programmatically), you have 3 sheets by default and the WorksheetParts that has the row data for Sheet1 is the last WorksheetParts element, not the first.

I figured this out by putting a watch for document.WorkbookPart.WorksheetParts in Visual Studio, expanding Results, then looking at all of the sub elements until I found a SheetData object where HasChildren = true.

Try this:

// open the document read-only
SpreadSheetDocument document = SpreadsheetDocument.Open(filePath, false);
SharedStringTable sharedStringTable = document.WorkbookPart.SharedStringTablePart.SharedStringTable;
string cellValue = null;

foreach (WorksheetPart worksheetPart in document.WorkbookPart.WorksheetParts)
{
    foreach (SheetData sheetData in worksheetPart.Worksheet.Elements<SheetData>())
    {
        if (sheetData.HasChildren)
        {
            foreach (Row row in sheetData.Elements<Row>())
            {
                foreach (Cell cell in row.Elements<Cell>())
                {
                    cellValue = cell.InnerText;

                    if (cell.DataType == CellValues.SharedString)
                    {
                        Console.WriteLine("cell val: " + sharedStringTable.ElementAt(Int32.Parse(cellValue)).InnerText);
                    }
                    else
                    {
                        Console.WriteLine("cell val: " + cellValue);
                    }
                }
            }
        }
    }
}
document.Close();

Question 3

Read Large Excel : openxml has two approaches of DOM and SAX to read an excel. the DOM one consume more RAM resource since it loads the whole xml content(Excel file) in Memory but its strong typed approach. SAX in other hand is event base parse. more here

so if you are facing large excel file its better to use SAX.

the below code sample uses SAX approach and also handle two important scenario in excel file reading.

open xml skips the empty cells so your dataset faces displacement and wrong index.
you need to skip the empty rows also.

this function returns the exact actual index of the cell at the time and handle the first scenario. from here

private static int CellReferenceToIndex(Cell cell)
        {
            int index = 0;
            string reference = cell.CellReference.ToString().ToUpper();
            foreach (char ch in reference)
            {
                if (Char.IsLetter(ch))
                {
                    int value = (int)ch - (int)'A';
                    index = (index == 0) ? value : ((index + 1) * 26) + value;
                }
                else
                    return index;
            }
            return index;
        }

code to read excel sax approach.

//i want to import excel to data table
            dt = new DataTable();

            using (SpreadsheetDocument document = SpreadsheetDocument.Open(path, false))
            {

                WorkbookPart workbookPart = document.WorkbookPart;
                WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

                OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);

                //row counter
                int rcnt = 0;

                while (reader.Read())
                {


                    //find xml row element type 
                    //to understand the element type you can change your excel file eg : test.xlsx to test.zip
                    //and inside that you may observe the elements in xl/worksheets/sheet.xml
                    //that helps to understand openxml better
                    if (reader.ElementType == typeof(Row))
                    {

                        //create data table row type to be populated by cells of this row
                        DataRow tempRow = dt.NewRow();



                        //***** HANDLE THE SECOND SENARIO*****
                        //if row has attribute means it is not a empty row
                        if (reader.HasAttributes)
                        {
                            
                            //read the child of row element which is cells

                            //here first element
                            reader.ReadFirstChild();



                            do
                            {
                                //find xml cell element type 
                                if (reader.ElementType == typeof(Cell))
                                {
                                    Cell c = (Cell)reader.LoadCurrentElement();

                                    string cellValue;

                                    
                                    int actualCellIndex = CellReferenceToIndex(c);

                                    if (c.DataType != null && c.DataType == CellValues.SharedString)
                                    {
                                        SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));

                                        cellValue = ssi.Text.Text;
                                    }
                                    else
                                    {
                                        cellValue = c.CellValue.InnerText;
                                    }



                                    //if row index is 0 its header so columns headers are added & also can do some headers check incase
                                    if (rcnt == 0)
                                    {
                                        dt.Columns.Add(cellValue);
                                    }
                                    else
                                    {
                                        // instead of tempRow[c.CellReference] = cellValue;
                                        tempRow[actualCellIndex] = cellValue;
                                    }

                                    

                                }


                            }
                            while (reader.ReadNextSibling());


                            //if its not the header row so append rowdata to the datatable
                            if (rcnt != 0)
                            {
                                dt.Rows.Add(tempRow);
                            }

                            rcnt++;


                        }


                    }





                }


            }

Question 4

Everything is explained in the accepted answer.
Here is just an extension method to solve the problem

public static string GetCellText(this Cell cell, in SharedStringTable sst)
{
    if (cell.CellValue is null)
        return string.Empty;

    if ((cell.DataType is not null) &&
        (cell.DataType == CellValues.SharedString))
    {
        int ssid = int.Parse(cell.CellValue.Text);
        return sst.ChildElements[ssid].InnerText;
    }

    return cell.CellValue.Text;
}