Existing API for reading dates from SpreadsheetML files?
-
13-10-2019 - |
Question
Is there an existing API in Office Open XML SDK or a 3rd party to properly read dates from a SpreadsheetML / .xlsx file?
Since there are so many variables that affect detecting when a value is a date (numFmtId + custom number formats) and then converting the date serial to a DateTime value (standard, backwards compatible, and 1904-super-backwards-compatible workbooks), it seems logical that this is something the SDK would provide or at least someone would have an existing code snippet to handle.
I'm using C# but a solution for any language would be fine.
Solution
Looks like there isn't anything out there already specific to this purpose. Here's the routine I came up with.
/// <summary>
/// Represents the formula used for converting date serial values stored within the workbook into DateTime instances.
/// </summary>
/// <remarks>
/// Information on date serial conversion is available here: http://www.documentinteropinitiative.com/implnotes/ISO-IEC29500-2008/001.018.017.004.001.000.000.aspx
/// </remarks>
public enum XlsxDateCompatibility
{
/// <summary>
/// Standard dates are based on December 30, 1899 and are considered "Standard 1900" dates.
/// </summary>
StandardBase1900,
/// <summary>
/// Excel for Windows backwards compatible dates are based on December 31, 1899 are are considered "Backwards compatible 1900" dates.
/// </summary>
BackwardsCompatibleBase1900,
/// <summary>
/// Excel for Macintos backwards compatible dates are based on January 1, 1904 and are considered "1904" dates.
/// </summary>
BackwardsCompatibleBase1904
}
private static readonly IDictionary<XlsxDateCompatibility, DateTime> _dateSerialBaseDates
= new Dictionary<XlsxDateCompatibility, DateTime>
{
{XlsxDateCompatibility.StandardBase1900, new DateTime(1899, 12, 30)},
{XlsxDateCompatibility.BackwardsCompatibleBase1900, new DateTime(1899, 12, 31)},
{XlsxDateCompatibility.BackwardsCompatibleBase1904, new DateTime(1904, 1, 1)}
};
public static DateTime DateSerialToDateTime(double dateSerial, XlsxDateCompatibility dateCompatibility)
{
// special case for dateCompaitility 1900, Excel thinks 1900 is a leap year
// http://support.microsoft.com/kb/214019
if (dateCompatibility == XlsxDateCompatibility.BackwardsCompatibleBase1900 && dateSerial >= 61.0)
{
dateSerial -= 1;
}
DateTime baseDate;
if (!_dateSerialBaseDates.TryGetValue(dateCompatibility, out baseDate))
{
baseDate = _dateSerialBaseDates[XlsxDateCompatibility.StandardBase1900];
}
return baseDate.AddDays(dateSerial);
}
OTHER TIPS
I have never read a Date in before, but I'd imagine you would have to compare the style index on the cell you are reading to the date style index in the x:numFmts
element, which you would find in the x:cellStyle
. I know office 2010 has a date data type indicator on the cell where <x:c t='d'>
so if you are using that version it would be a lot easier to find if the data is a date or not. This is what it would look like in Office 2010:
<x:c r="C4" t="d">
<x:v>1976-11-22T08:30Z</x:v>
</x:c>
To convert the data into a DateTime I believe all you have to do is a DateTime.FromOADate(cellvalue)
where cellValue is a double. I know we convert a DateTime to an OADate before inserting dates into our excel documents so I'd imagine using the FromOADate method would work fine.
As far as any API's to do these functions, I am not aware of any that will perform what you want, but I wish it would be included in future versions of the SDK.