Question

Is there an existing API in Office Open XML SDK or a 3rd party to properly read dates from a SpreadsheetML / .xlsx file?

Since there are so many variables that affect detecting when a value is a date (numFmtId + custom number formats) and then converting the date serial to a DateTime value (standard, backwards compatible, and 1904-super-backwards-compatible workbooks), it seems logical that this is something the SDK would provide or at least someone would have an existing code snippet to handle.

I'm using C# but a solution for any language would be fine.

Was it helpful?

Solution

Looks like there isn't anything out there already specific to this purpose. Here's the routine I came up with.

/// <summary>
/// Represents the formula used for converting date serial values stored within the workbook into DateTime instances.
/// </summary>
/// <remarks>
/// Information on date serial conversion is available here: http://www.documentinteropinitiative.com/implnotes/ISO-IEC29500-2008/001.018.017.004.001.000.000.aspx
/// </remarks>
public enum XlsxDateCompatibility
{
    /// <summary>
    /// Standard dates are based on December 30, 1899 and are considered "Standard 1900" dates.
    /// </summary>
    StandardBase1900,

    /// <summary>
    /// Excel for Windows backwards compatible dates are based on December 31, 1899 are are considered "Backwards compatible 1900" dates.
    /// </summary>
    BackwardsCompatibleBase1900,

    /// <summary>
    /// Excel for Macintos backwards compatible dates are based on January 1, 1904 and are considered "1904" dates.
    /// </summary>
    BackwardsCompatibleBase1904
}

    private static readonly IDictionary<XlsxDateCompatibility, DateTime> _dateSerialBaseDates
        = new Dictionary<XlsxDateCompatibility, DateTime>
            {
                {XlsxDateCompatibility.StandardBase1900, new DateTime(1899, 12, 30)},
                {XlsxDateCompatibility.BackwardsCompatibleBase1900, new DateTime(1899, 12, 31)},
                {XlsxDateCompatibility.BackwardsCompatibleBase1904, new DateTime(1904, 1, 1)}
            };

    public static DateTime DateSerialToDateTime(double dateSerial, XlsxDateCompatibility dateCompatibility)
    {

        // special case for dateCompaitility 1900, Excel thinks 1900 is a leap year
        // http://support.microsoft.com/kb/214019
        if (dateCompatibility == XlsxDateCompatibility.BackwardsCompatibleBase1900 && dateSerial >= 61.0)
        {
            dateSerial -= 1;
        }

        DateTime baseDate;          
        if (!_dateSerialBaseDates.TryGetValue(dateCompatibility, out baseDate))
        {
            baseDate = _dateSerialBaseDates[XlsxDateCompatibility.StandardBase1900];
        }
        return baseDate.AddDays(dateSerial);
    }

OTHER TIPS

I have never read a Date in before, but I'd imagine you would have to compare the style index on the cell you are reading to the date style index in the x:numFmts element, which you would find in the x:cellStyle. I know office 2010 has a date data type indicator on the cell where <x:c t='d'> so if you are using that version it would be a lot easier to find if the data is a date or not. This is what it would look like in Office 2010:

<x:c r="C4" t="d"> 
   <x:v>1976-11-22T08:30Z</x:v>
</x:c> 

To convert the data into a DateTime I believe all you have to do is a DateTime.FromOADate(cellvalue) where cellValue is a double. I know we convert a DateTime to an OADate before inserting dates into our excel documents so I'd imagine using the FromOADate method would work fine.

As far as any API's to do these functions, I am not aware of any that will perform what you want, but I wish it would be included in future versions of the SDK.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top