Question

I am exporting an excel workbook into xml spreadsheet. The excel has lets say 10 columns and 10 rows. Some of the cells are empty(i.e with no value).

When i save the file into xml spreadsheet and review the row that has blank cell in it, it has only cells: the cell with the empty value is not there and the xml show that the cell before the blank, and the cell after the blank are one after another (the empty cell just doesn't exists).

Here is a sample of the xml:

<Cell ss:StyleID="s36"><Data ss:Type="Number">cell1</Data><NamedCell
  ss:Name="Print_Area"/></Cell>
<Cell><Data ss:Type="String">cell2</Data><NamedCell ss:Name="Print_Area"/></Cell>
<Cell><Data ss:Type="String">cell4</Data><NamedCell
  ss:Name="Print_Area"/></Cell>

The missing cell is cell3


Is there a way to ask excel not to save space? The recreation is not that easy as it seems using xslt?

Was it helpful?

Solution

If the cell is empty this seems a reasonable optimization to save space - why should it not be missing.

You have enough information to recreate the original spreadsheet

OTHER TIPS

Exactly where is the information stored that lets him recreate the spreadsheet? If these rows:

  • Data, empty, Data, empty, Data
  • Data, Data, Data, empty, empty
  • Data, empty, empty, Data, Data

all give

  • Row
  • Cell Data /Data /Cell
  • Cell Data /Data /Cell
  • Cell Data /Data /Cell
  • /Row

You may build your own VBA macro. Like this one. And add a reference to Microsoft.xml.

Sub makeXml()
    ActiveCell.SpecialCells(xlLastCell).Select
    Dim lastRow, lastCol As Long
    lastRow = ActiveCell.Row
    lastCol = ActiveCell.Column

    Dim iRow, iCol As Long

    Dim xDoc As New DOMDocument
    Dim rootNode As IXMLDOMNode
    Set rootNode = xDoc.createElement("Root")
    Dim rowNode As IXMLDOMNode
    Dim colNode As IXMLDOMNode

    'loop over the rows
    For iRow = 2 To lastRow
        Set rowNode = xDoc.createElement("Row")
        'loop over the columns
        For iCol = 1 To lastCol
            If (Len(ActiveSheet.Cells(1, iCol).Text) > 0) Then
                Set colNode = xDoc.createElement(GetXmlSafeColumnName(ActiveSheet.Cells(1, iCol).Text))

                colNode.Text = ActiveSheet.Cells(iRow, iCol).Text
                rowNode.appendChild colNode
            End If
        Next iCol
        rootNode.appendChild rowNode
    Next iRow
    xDoc.appendChild rootNode

    fileSaveName = Application.GetSaveAsFilename( _
    fileFilter:="XML Files (*.xml), *.xml")
      xDoc.Save (fileSaveName)
    set xDoc = Nothing

End Sub
Function GetXmlSafeColumnName(name As String)
    Dim ret As String
    ret = name
    ret = Replace(ret, " ", "_")
    ret = Replace(ret, ".", "")
    ret = Replace(ret, ",", "")
    ret = Replace(ret, "&", "")
    ret = Replace(ret, "!", "")
    ret = Replace(ret, "@", "")
    ret = Replace(ret, "$", "")
    ret = Replace(ret, "#", "")
    ret = Replace(ret, "%", "")
    ret = Replace(ret, "^", "")
    ret = Replace(ret, "*", "")
    ret = Replace(ret, "(", "")
    ret = Replace(ret, ")", "")
    ret = Replace(ret, "-", "")
    ret = Replace(ret, "+", "")

    GetXmlSafeColumnName = ret
End Function

I had the same issues before I've written some code to deal with omitted empty cells. You just need to use ss:Index attribute value of Cell element if it exists (read XML Spreadsheet Reference for details) and store Cell contents into a proper indexed array position to recreate the original cells order.

<?php
$doc = new DOMDocument('1.0', 'utf-8');
if (!$doc->load('sample.xml'))
    die();

$root = $doc->documentElement;
$root->removeAttributeNS($root->getAttributeNode('xmlns')->nodeValue, '');

$xpath = new DOMXPath($doc);
foreach ($xpath->query('/Workbook/Worksheet/Table/Row') as $row)
{
    $cells = array();
    $cell_index = 0;
    foreach ($xpath->query('./Cell', $row) as $cell)
    {
        if ($cell->hasAttribute('ss:Index'))
            $cell_index = $cell->getAttribute('ss:Index');
        else
            ++$cell_index;
        $cells[$cell_index - 1] = $cell->nodeValue;
    }
    // now process data
    print_r($cells);
}

Note that empty cells will not be added to the array, while everything else is on its place. You may calculate the maximum possible cell index (the number of table columns) through all rows if you need some.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top