Pregunta

Is there any guideline on when to normalize a database or just use composite types and arrays?

When using arrays and composite types, I can use just a single table. I can also normalize the database and use a couple of tables and joins.

How do you decide which option is best?

¿Fue útil?

Solución

Most of the time, stick to normalization. Among other things, keeping your database fairly well normalized helps with lock granularity. For example, if you have a "parent" object with two arrays in it, you cannot have transactions that are simultaneously adding/updating/modifying members of the arrays. If they're regular side tables, you can. (You can still SELECT ... FOR UPDATE the parent row before updating child objects if you want the serialized behaviour, though).

Updating an array to add/replace/delete a value is expensive, as PostgreSQL must rewrite the whole tuple the array is in as an MVCC update. (It has a few TOAST tricks up its sleeve that can help, but not tons). Ditto composite types embedded in rows.

Big wide rows full of arrays and composites mean slower table scans, meaning slower fetches for commonly used values.

IIRC you can't define a foreign key into a field of a composite type, so you'll find yourself working around that or giving up on referential integrity where it'd be good to have. Ditto arrays (there was work to get foreign keys to arrays to work but I don't think it ever got comitted).

Many client drivers (PgJDBC, psqlODBC, psycopg2, etc etc etc) have incomplete to nonexistent support for arrays and composites, so you'll often land up expanding them into tuples for client driver interaction anyway. Some things, like arrays of composite types, are really quite painful to work with.

Most ORMs, including common ones like Hibernate, totally suck at using anything beyond the most utterly simplistic lowest-common-denominator SQL features. Sooner or later, someone's going to want to point one of those at your data model, at which point much wailing and gnashing of teeth will ensue. OTOH, don't accomodate garbage ORMs to the point where you avoid using features that'll greatly improve the data model and solve real world problems - for example, if you have the choice of storing native hstore fields, or using an EAV schema, consider just using jstore (or better, in 9.4, json with hstore features).

(Perversely, this means that people who have the most "object oriented" programs often have the most purely relational databases because their tools suck).

Things like report generation tools will similarly struggle with composites and arrays, so you'll often land up creating views to present a normalized appearance for the DB anyway. Then ON INSERT OR UPDATE OR DELETE ... DO INSTEAD triggers on the views to enable writes. At which point it gets ugly.

Personally I recommend keeping composites for times when it's logical to model something as a "type". Consider, say, if your data model required you to track timestamps in their original time zone. There's no built-in type for this (no, that's not what "timestamp with time zone" does, despite the name, thanks SQL committee), so you might create a composite type that stored (timestamp without time zone, tzname) and use that consistently in your data model.

Similarly, I tend to use arrays in queries a lot, but not in the data model much. They're useful when you want to intentionally denormalize something for performance, but that's often done in a materialized view or similar. Even if it's a change to the main data model, it's the sort of thing you should be doing based on proper performance review, not just "optimizing" stuff you don't know is slow yet.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top