Pregunta

In a scenerio when I need to use the the entire date (i.e. day, month, year) as a whole, and never need to extract either the day, or month, or the year part of the date, in my database application program, what is the best practice:

  1. Making Date an atomic attribute
  2. Making Date a composite attribute (composed of day, month, and year)

Edit:- The question can be generalized as:

Is it a good practice to make composite attributes where possible, even when we need to deal with the attribute as a whole only?

¿Fue útil?

Solución

Actually, the specific question and the general question are significantly different, because the specific question refers to dates.

For dates, the component elements aren't really part of the thing you're modelling - a day in history - they're part of the representation of the thing you're modelling - a day in the calendar that you (and most of the people in your country) use.

For dates I'd say it's best to store it in a single date type field.

For the generalized question I would generally store them separately. If you're absolutely sure that you'll only ever need to deal with it as a whole, then you could use a single field. If you think there's a possibility that you'll want to pull out a component for separate use (even just for validation), then store them separately.


With dates specifically, the vast majority of modern databases store and manipulate dates efficiently as a single Date value. Even in situations when you do want to access the individual components of the date I'd recommend you use a single Date field.

You'll almost inevitably need to do some sort of date arithmetic eventually, and most database systems and programming languages give some sort of functionality for manipulating dates. These will be easier to use with a single date variable.

With dates, the entire composite date identifies the primary real world thing you're identifying.

The day / month / year are attributes of that single thing, but only for a particular way of describing it - the western calendar.

However, the same day can be represented in many different ways - the unix epoch, a gregorian calendar, a lunar calendar, in some calendars we're in a completely different year. All of these representations can be different, yet refer to the same individual real world day.

So, from a modelling point of view, and from a database / programmatic efficiency point of view, for dates, store them in a single field as far as possible.


For the generalisation, it's a different question.

Based on experience, I'd store them as separate components. If you were really really sure you'd never ever want to access component information, then yes, one field would be fine. For as long as you're right. But if there's even an ability to break the information up, I peronally would separate them from the start.

It's much easier to join fields together, than to separate fields from a component string. That's both from a programm / algorithmic viewpoint and from compute resource point of view.

Some of the most painful problems I've had in programming have been trying to decompose a single field into component elements. They'd initially been stored as one element, and by the time the business changed enough to realise they needed the components... it had become a decent sized challenge.

Most composite data items aren't like dates. Where a date is a single item, that is sometimes (ok, generally in the western world) represented by a Day-Month-Year composite, most composite data elements actually represent several concrete items, and only the combination of those items truly uniquely represent a particular thing.

For example a bank account number (in New Zealand, anyway) is a bit like this:

  • A bank number - 2 or 3 digits
  • A branch number - 4 to 6 digits
  • An account / customer number - 8 digits
  • An account type number - 2 or 3 digits.

Each of those elements represents a single real world thing, but together they identify my account.

You could store these as a single field, and it'd largely work. You might decide to use a hyphen to separate the elements, in case you ever needed to.

If you really never need to access a particular piece of that information then you'd be good with storing it as a composite.

But if 3 years down the track one bank decides to charge a higher rate, or need different processing; or if you want to do a regional promotion and could key that on the branch number, now you have a different challenge, and you'll need to pull out that information. We chose hyphens as separators, so you'll have to parse out each row into the component elements to find them. (These days disk is pretty cheap, so if you do this, you'll store them. In the old days it was expensive so you had to decide whether to pay to store it, or pay to re-calculate it each time).

Personally, in the bank account case (and probably the majority of other examples that I can think of) I'd store them separately, and probably set up reference tables to allow validation to happen (e.g. you can't enter a bank that we don't know about).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top