How do you store Date ranges, which are actually timestamps

https://stackoverflow.com/questions/156032

03-07-2019
|

Question

Java & Oracle both have a timestamp type called Date. Developers tend to manipulate these as if they were calendar dates, which I've seen cause nasty one-off bugs.

For a basic date quantity you can simply chop off the time portion upon input, i.e., reduce the precision. But if you do that with a date range, (e.g.: 9/29-9/30), the difference between these two values is 1 day, rather than 2. Also, range comparisons require either 1) a truncate operation: start < trunc(now) <= end, or 2) arithmetic: start < now < (end + 24hrs). Not horrible, but not DRY.
An alternative is to use true timestamps: 9/29 00:00:00 - 10/1 00:00:00. (midnight-to-midnight, so does not include any part of Oct). Now durations are intrinsically correct, and range comparisons are simpler: start <= now < end. Certainly cleaner for internal processing, however end dates do need to be converted upon initial input (+1), and for output (-1), presuming a calendar date metaphor at the user level.

How do you handle date ranges on your project? Are there other alternatives? I am particularly interested in how you handle this on both the Java and the Oracle sides of the equation.

Solution

Here's how we do it.

Use timestamps.
Use Half-open intervals for comparison: start <= now < end.

Ignore the whiners who insist that BETWEEN is somehow essential to successful SQL.

With this a series of date ranges is really easy to audit. The database value for 9/30 to 10/1 encompass one day (9/30). The next interval's start must equal the previous interval's end. That interval[n-1].end == interval[n].start rule is handy for audit.

When you display, if you want, you can display the formatted start and end-1. Turns out, you can educate people to understand that the "end" is actually the first day the rule is no longer true. So "9/30 to 10/1" means "valid starting 9/30, no longer valid starting 10/1".

OTHER TIPS

Oracle has the TIMESTAMP datatype. It stores the year, month, and day of the DATE datatype, plus hour, minute, second and fractional second values.

Here is a thread on asktom.oracle.com about date arithmetic.

I second what S.Lott explained. We have a product suite which makes extensive use of date time ranges and it has been one of our lessons-learned to work with ranges like that. By the way, we call the end date exclusive end date if it is not part of the range anymore (IOW, a half open interval). In contrast, it is an inclusive end date if it counts as part of the range which only makes sense if there is no time portion.

Users typically expect input/output of inclusive date ranges. At any rate, convert user input as soon as possible to exclusive end date ranges, and convert any date range as late as possible when it has to be shown to the user.

On the database, always store exclusive end date ranges. If there is legacy data with inclusive end date ranges, either migrate them on the DB if possible or convert to exclusive end date range as soon as possible when the data is read.

I use Oracle's date data type and educate developers on the issue of time components affecting boundary conditions.

A database constraint will also prevent the accidental specification of a time component in a column that should have none and also tells the optimizer that none of the values have a time component.

For example, the constraint CHECK (MY_DATE=TRUNC(MY_DATE)) prevents a value with a time other than 00:00:00 being placed into the my_date column, and also allows Oracle to infer that a predicate such as MY_DATE = TO_DATE('2008-09-12 15:00:00') will never be true, and hence no rows will be returned from the table because it can be expanded to:

MY_DATE = TO_DATE('2008-09-12 15:00:00') AND
TO_DATE('2008-09-12 15:00:00') = TRUNC(TO_DATE('2008-09-12 15:00:00'))

This is automatically false of course.

Although it is sometimes tempting to store dates as numbers such as 20080915 this can cause query optimization problems. For example, how many legal values are there between 20,071,231 and 20,070,101? How about between the dates 31-Dec-2007 abnd 01-Jan-2008? It also allows illegal values to be entered, such as 20070100.

So, if you have dates without time components then defining a range becomes easy:

select ...
from   ...
where  my_date Between date '2008-01-01' and date '2008-01-05'

When there is a time component you can do one of the following:

select ...
from   ...
where  my_date >= date '2008-01-01' and
       my_date  < date '2008-01-06'

select ...
from   ...
where  my_date Between date '2008-01-01'
                   and date '2008-01-05'-(1/24/60/60)

Note the use of (1/24/60/60) instead of a magic number. It's pretty common in Oracle to perform date arithmetic by adding defined fractions of a day ... 3/24 for three hours, 27/24/60 for 27 minutes. Oracle math of this type is exact and doesn't suffer rounding errors, so:

select 27/24/60 from dual;

... gives 0.01875, not 0.01874999999999 or whatever.

I don't see the Interval datatypes posted yet.

Oracle also has datatypes for your exact scenario. There are INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND datatypes in Oracle as well.

From the 10gR2 docs.

INTERVAL YEAR TO MONTH stores a period of time using the YEAR and MONTH datetime fields. This datatype is useful for representing the difference between two datetime values when only the year and month values are significant.

INTERVAL YEAR [(year_precision)] TO MONTH

where year_precision is the number of digits in the YEAR datetime field. The default value of year_precision is 2.

INTERVAL DAY TO SECOND Datatype

INTERVAL DAY TO SECOND stores a period of time in terms of days, hours, minutes, and seconds. This datatype is useful for representing the precise difference between two datetime values.

Specify this datatype as follows:

INTERVAL DAY [(day_precision)] TO SECOND [(fractional_seconds_precision)]

where

day_precision is the number of digits in the DAY datetime field. Accepted values are 0 to 9. The default is 2.

fractional_seconds_precision is the number of digits in the fractional part of the SECOND datetime field. Accepted values are 0 to 9. The default is 6.

You have a great deal of flexibility when specifying interval values as literals. Please refer to "Interval Literals" for detailed information on specify interval values as literals. Also see "Datetime and Interval Examples" for an example using intervals.

Based on my experiences, there are four main ways to do it:

1) Convert the date to an epoch integer (seconds since 1st Jan 1970) and store it in the database as an integer.

2) Convert the date to a YYYYMMDDHHMMSS integer and store it in the database as an integer.

3) Store it as a date

4) Store it as a string

I've always stuck with 1 and 2, because it enables you to perform quick and simple arithmetic with the date and not rely on the underlying database functionality.

Based upon your first sentence, you're stumbling upon one of the hidden "features" (i.e. bugs) of Java: java.util.Date should have been immutable but it ain't. (Java 7 promises to fix this with a new date/time API.) Almost every enterprise app counts on various temporal patterns, and at some point you will need to do arithmetic on date and time.

Ideally, you could use Joda time, which is used by Google Calendar. If you can't do this, I guess an API that consists of a wrapper around java.util.Date with computational methods similar to Grails/Rails, and of a range of your wrapper (i.e. an ordered pair indicating the start and end of a time period) will be sufficient.

On my current project (an HR timekeeping application) we try to normalize all our Dates to the same timezone for both Oracle and Java. Fortunately, our localization requirements are lightweight (= 1 timezone is enough). When a persistent object doesn't need finer precision than a day, we use the timestamp as of midnight. I would go further and insist upon throwing away the extra milli-seconds to the coarsest granularity that a persistent object can tolerate (it will make your processing simpler).

All dates can be unambiguously stored as GMT timestamps (i.e. no timezone or daylight saving headaches) by storing the result of getTime() as a long integer.

In cases where day, week, month, etc. manipulations are needed in database queries, and when query performance is paramount, the timestamps (normalized to a higher granularity than milliseconds) can be linked to a date breakdown table that has columns for the day, week, month, etc. values so that costly date/time functions don't have to be used in queries.

Alan is right- Joda time is great. java.util.Date and Calendar are just a shame.

If you need timestamps use the oracle date type with the time, name the column with some kind of suffix like _tmst. When you read the data into java get it into a joda time DateTime object. to make sure the timezone is right consider that there are specific data types in oracle that will store the timestamps with the timezone. Or you can create another column in the table to store the timezone ID. Values for the timezone ID should be standard full name ID for Timezones see http://java.sun.com/j2se/1.4.2/docs/api/java/util/TimeZone.html#getTimeZone%28java.lang.String%29 . If you use another column for the TZ dta then when you read the data into java use DateTime object but set the timezone on the DateTime object using the .withZoneRetainFields to set the timezone.

If you only need the date data (no timestamp) then use the date type in the database with no time. again name it well. in this case use DateMidnight object from jodatime.

bottom line: leverage the type system of the database and the language you are using. Learn them and reap the benefits of having expressive api and language syntax to deal with your problem.

UPDATE: The Joda-Time project is now in maintenance mode. Its team advises migration to the java.time classes built into Java.

Joda-Time

Joda-Time offers 3 classes for representing a span of time: Interval, Duration, and Period.

The ISO 8601 standard specifies how to format strings representing a Duration and an Interval. Joda-Time both parses and generates such strings.

Time zone is a crucial consideration. Your database should be storing its date-time values in UTC. But your business logic may need to consider time zones. The beginning of a "day" depends on time zone. By the way, use proper time zone names rather than 3 or 4 letter codes.

The correct answer by S.Lott wisely advises to use Half-Open logic, as that usually works best for date-time work. The beginning of a span of time is inclusive while the ending is exclusive. Joda-Time uses half-open logic in its methods.

diagram defining a week as greater than or equal to Day 1 and less than Day 8

DateTimeZone timeZone_NewYork = DateTimeZone.forID( "America/New_York" );
DateTime start = new DateTime( 2014, 9, 29, 15, 16, 17, timeZone_NewYork );
DateTime stop = new DateTime( 2014, 9, 30, 1, 2, 3, timeZone_NewYork );

int daysBetween = Days.daysBetween( start, stop ).getDays();

Period period = new Period( start, stop );

Interval interval = new Interval( start, stop );
Interval intervalWholeDays = new Interval( start.withTimeAtStartOfDay(), stop.plusDays( 1 ).withTimeAtStartOfDay() );

DateTime lateNight29th = new DateTime( 2014, 9, 29, 23, 0, 0, timeZone_NewYork );
boolean containsLateNight29th = interval.contains( lateNight29th );

Dump to console…

System.out.println( "start: " + start );
System.out.println( "stop: " + stop );
System.out.println( "daysBetween: " + daysBetween );
System.out.println( "period: " + period ); // Uses format: PnYnMnDTnHnMnS
System.out.println( "interval: " + interval );
System.out.println( "intervalWholeDays: " + intervalWholeDays );
System.out.println( "lateNight29th: " + lateNight29th );
System.out.println( "containsLateNight29th: " + containsLateNight29th );

When run…

start: 2014-09-29T15:16:17.000-04:00
stop: 2014-09-30T01:02:03.000-04:00
daysBetween: 0
period: PT9H45M46S
interval: 2014-09-29T15:16:17.000-04:00/2014-09-30T01:02:03.000-04:00
intervalWholeDays: 2014-09-29T00:00:00.000-04:00/2014-10-01T00:00:00.000-04:00
lateNight29th: 2014-09-29T23:00:00.000-04:00
containsLateNight29th: true

Im storing all dates in milliseconds. I do not use timestamps/datetime fields at all.

So, i have to manipulate it as longs. It means i do not use 'before', 'after', 'now' keywords in my sql queries.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow