Should I store dates or recurrence rules in my database when building a calendar app?

https://stackoverflow.com/questions/4239871

27-09-2019
|

Question

I am building a calendar website (ASP.NET MVC) application (think simple version of outlook) and i want to start supporting calendar events that are recurring (monthly, yearly, etc)

right now I am storing actual dates in my but I wanted to figure out if, with recurrence, does it make sense to continue to store dates (with some obvious cutoff), or should I store the recurrence options and generate the dates on the fly.

It got me thinking how outlook, google mail, etc does this or any other service that supports recurring calendar items.

Are there any suggestions on this?

Solution

Separate your data into two parts: the "canonical" data (the recurrence rule) and "serving" (generated dates; read-only aside from regeneration). If the canonical data changes, regenerate the "serving" data at that point. For infinite recurrences, keep some number of instances and generate more if you run out (e.g. if the user looks at their calendar for 2020).

If you had infinite processor speed, you'd only need the canonical data - but in reality, doing all the date/time processing for all the recurrence rules on every page view is likely to be too time-consuming... so you trade off some storage (and complexity) to save that repeated computation. Storage is usually pretty cheap, compared with the computation required for a large number of events. If you only need to store the dates of the events, that's really very cheap - you could easily use a 4 byte integer to represent a date, and then generate a complete date/time from that, assuming your recurrences are all date based. For time-based recurrences (e.g. "every three hours") you could full UTC instants - 8 bytes will represent that down to a pretty fine resolution for as long as you're likely to need.

You need to be careful about maintaining validity though - if a recurring meeting changes today, that doesn't change when it has happened in the past... so you probably want to also have canonical read-only data about when recurrences actually occurred. Obviously you won't want that to keep the past forever, so you probably want to "garbage collect" events more than a few years old, depending on your storage limitations.

You may also need the ability to add notes and exceptions (e.g. "meeting doesn't occur today due to a public holiday" or "moved to 4pm") on a per-occurrence basis. That becomes really fun when you change the recurrence - if you change "every Monday" to "every Tuesday" do you keep the exceptions or not? How do you even match up the exceptions when you change from "every day" to "every week"? These aren't questions which are directly about storage - but the storage decisions will affect how easy it is to implement whatever policy you decide on.

OTHER TIPS

You will need to handle separately events and occurrences.

EVENT WISE: For events, you will need to store recurence rules (which can be an rrule like specified by rfc5545 but also an explicit set of dates like rdate in rfc5545) but also exceptions (see exdate of rfc5545 and possibly exrule as in rfc2445). You will also need to keep track of changes in those rules: Changes in rdate, exdate are no problem when they occur in the future and to be ignored for past dates. Changes in rrule are more tricky as impacting previous occurences. My personal preference is to add a specific property for the old and new rrule to specify their respective start and end date of validity.

if the event has a limited time span (say COUNT or UNTIL property are present) you should store its start and end in your table to allow easier querying of events (especially when looking for occurrences outside your precalculated time window (see below), it can help reduce the number of events for which the computation is to be redone).

OCCURRENCES WISE: for occurences you should store instances within a predefined window around present (say +/- 6 months or 12months and computed on a regular basis) and keep records of this to allow re-calculation if your users want to see further in the future (for performances issues). you should also consider computing the index (RECURRENCE-ID) to help easier finding of the next occurrence.

less on the back-end but more on the front-end you should also keep track of tzid changes to ask the user if an event which was scheduled on a given tzid if it is meant to stay on current time zone of it needs to be updated (think of someone in Samoa island which had schedule a meeting on Friday, Dec. 30 2011 before the country decided this day would not exist), similarly you can ask if an event which happens during the daylight saving time is meant to "never happen" or "happen twice" (more on this topic here)

Note: you may want to consider support beyond what is defined in rfc5545 in terms of recurence rules and also add support for religious recurring rules ( see USNO introduction to calendars or in print "Calendrical Calculations" (Third Edition) from E. Reingol and N. Dershowitz).

Since you ask about existing implementation, you can check easily the database schema of sunbird (sqlite) or of the Apple open source Calendar and Contacts Server, a more complete list of existing open source projects for caldav servers (which is probably a subset of what you are looking for) is available here)

I had to build a system that worked with scheduling and we did both. Here's what we had

a set of tables that kept track of the schedule.
a table that kept track of previous instances of the schedule(when they actually occurred)
a table that kept track of the last and next instance (when the next item is due to occur based on the last time) You don't need this table, but we used it, because otherwise you would constantly be calculating if an item should be occurring now

With scheduling, things can get really tricky because you have to remember that at any point in time, the schedule can change. Also, an item may be due when your application is not running, and when it starts up again, you need to know how to identify past due items.

Also, we made sure that the tables that kept track of the actual schedule stood alone. The reason for this is that those were the most complex set of tables in the system and we wanted to be able to reuse them so that they could be used for different things that needed scheduling. Such as sending admin emails, sending notifications, and server maintenance like cleaning up log files.

I would definitely use your second option. Use different recurrence options, store it separately and calculate on the fly. Storing all those dates would be a boatload of data that is not necessary.

Here's a good answer to compliment your question.
Data structure for storing recurring events?

Also, as a side note. I've started storing everything as UTC time so that you have a common baseline if you ever need to use multiple timezones.

I had a similar problem in a web-application I made a few years ago (there may well be a better way now :) ). I wanted to include a scheduler which had all the functionality of recurring events, handle time, days, weeks, months, years, and exceptions, so that I could have rules like:

1) Every Day at 10am excepted Wednesdays

2) Every 2 hours with a maximum of 4 iterations per day

3) Every First Monday of the Month

etc..

Storing the recurring dates/times was possible, but inflexible. Each iteration of your event changes when the "maximum" would be. And how far ahead do you look?

In the end I wrote a custom scheduling class that could read from and write to a string. This was the string that was stored in the database and then a simple function can be called to find out when the next occurrence is.

You need to store some of them for sure. The user might edit one of the events, leaving others untouched (you probably met the question: "Do you want to edit all recurring events or only this one?" in some calendars, ie. Windows Mobile).

You also might want to store past events and not remove them when the user deletes the recurring event.

If you store all others or generate them is an implementation detail. I would prefer to generate them, if possible.

In any case you'll want to have some ID of the recurring event stored with each event, plus some flag telling you if the event was modified later. Or in more complicated approach, a flag for each event property telling, if it's default value (from the recurring event) or if it was modified for this particular instance. You'll need this when the user decides to edit the recurring event.

Note that most replies lean toward saving generated data. But make sure you consider your use case.

Back in the days my servers were limited by io with a lot of cpu doing nothing. Nowadays you have ssd (if you can afford those, otherwise your left with an old spinning hd) but note that core count has increased too.

The nice thing about these kind of calculations is that you can split them easily and give them to your many cores or even to a few cheap servers in a local network. Often cheaper than setting up a nosql cluster or going the full database cluster way.

And alternative might also be a cache, just cache your calendar view, no need to do all the calculations every time when nothing has changed.

But as said it depends on your use case. Don't just follow the above answers but do your own calculations if you have time and than make a decision.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow