Approach for storing circa and actual dates in genealogy data model
-
26-05-2021 - |
Question
I am developing a geneology application, and I have a need to store dates for events. However I need to be able to support circa dates and actual dates, so I am thinking of having evendate and ceventdate (for the circa) columns so that I can index the event date when it is available.
However the more I think about the more I wonder whether I should have the following instead: eventyear, eventmonth, eventday, circa (definition) so that I can store years and months from analyzing the circa.
Thoughts?
La solution
I use the GEDCOM definition of dates. I think they did a pretty good job of thinking dates through and including much of what is needed.
I recently wrote a blog post about GEDCOM dates. Some of what I said was:
The basic date in GEDCOM is like this: dd MMM yyyy, e.g. 02 JUL 1917.
Some things to know about that basic date: You can list either “day month year”, or “month year” or “just the year”. The day can be 1 or 2 digits, so 02 JUL 1917 or 2 JUL 1917 are allowed.
For approximate dates, they use:
- ABT date
- CAL date
- EST date
where ABT means “about” and is for inexact date. CAL is calculated mathematically, e.g. from an event date and age, and EST is estimated based on an algorithm using some other event date.
There are a bunch of other constructs as well, including interpreted dates and date phrases.
Since you'll most likely want your genealogy application to import from GEDCOM and export to GEDCOM, you may want to decide on some date implementation that will be easy to translate to/from the GEDCOM format.
Autres conseils
There are several unknowns here still for how you want to handle the circa dates. Circa dates could be any of the following either be a accuracy to month or year or a given time frame of varing accuracy.
If you only expect to use a circa date that can either have an accuracy of month or year I would suggest the following:
Use an event_date
field for exact dates in whatever table you are tracking this, probably an events
table. For circa dates, use an event_date
but add a month
or year
to an accuracy
column. Your system could then only take the entered date to the accuracy set in that column. Should your dates need to be date-times then the accuracy
can easily be expanded to take the day into consideration.
If you are dealing with circa dates that can span between two accuracies, i would suggeest that you use the previous soultion with a couple of additional critera:
If the events
table assumptions holds true, you will need an additional table for event_dates
that will use the date columns I discussed earlier. This new table will also need a column for date_type
that represents the start or end of the event circa dates. The event_dates
table will need a link to the event_id
to maintain the relationship.
Alternatively, you could create a circa date in the same table if your requirements can be met with specifying a buffer_date
or date_range
. This solution would take the exact date entered and use logic to say that any date that occurs within the number of days/months/years specified in the buffer_date
should bring up that event.
Hope that helps.