Question

I'm planning to use JavaDB (Derby) or PostgreSQL.

I have the following problem: I need to store a large set of vectors. Currently all vectors contain a fixed number of elements. Hence the appropriate way of storing the set is using one row per vector and a column per element. However, the number of elements might change over time. Additionally, in my case, from a software engineering perspective, having a fixed number of columns reflects knowledge about a software component which the general model should be unaware of.

Therefore I'm thinking about "linearizing" the layout and use a general table that stores elements instead of vectors.

The first element of the vector 5 could then be queried like this:

SELECT value FROM elements where v_id = 5 and e_id = 1;

In general, I do not need full table reads, and only a relatively small subset of the vectors is accessed.

Maybe database-savvy people can judge what the performance impact will be?

Many thanks in advance.

Was it helpful?

Solution

This is a variant of what's referred to in general database terms as Entity-Attribute-Value or EAV design. It's a bit of a relational database design anti-pattern and should be avoided in most cases. Performance tends to be poor due to the need for many self-joins, and queries are ugly at best.

In PostgreSQL look into the intarray extension, it should solve your problem pretty ideally if the values are simple integers. Otherwise consider PostgreSQL's standard array types. They've got their own issues, but are generally a lot better than EAV, though they're not lovely to work with from JDBC.

Otherwise, if all you're storing is these vectors, maybe consider a non-relational DB.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top