Storing flags in SQL column, and indexing them

Question 1

If performance is an issue I would go for a some different model here.

Say a table that store entities and a relation 1->N to another table (say: flags table: entId(fk), flag, position) and this table would have an index on flag and position.

The issue here would be to get this flags in a simple column wich can be done in java or even on the database (but it would be difficult to have a cross plataform query to this)

Question 2

If you want a database-independent, reasonable method for storing such flags, then use typical SQL data types. For a binary flag, you can use bit or boolean (this differs among databases). For other flags, you can use tinyint or smallint.

Doing bit-fiddling is not going to be portable. If nothing else, the functions used to extract particular bits from data differ among databases.

Second, if performance is an issue, then you may need to create indexes to avoid full table scans. You can create indexes on normal SQL data types (although some databases may not allow indexes on bits).

It sounds like you are trying to be overly clever. You should first get the application to work using reasonable data structures. Then you will understand where the performance issues are and can work on fixing them.

Question 3

I have improved my design and performed a benchmark and found an interesting result.

I created a dummy demographic entity with first/last name columns, birthdate, birthplace, email, SSN...

Then in version 1

I added a column VALIDATION VARCAHR(40) NULL DEFAULT NULL with an index on it.

Instead of positional flags, the new column contains an unordered set of codes each representing a specific format error (e.g. A01 means "last name not specified", etc.). Each code is terminated by a colon : symbol.

Example columns look like

NULL
'A01:A03:A10:'
'A05:'

Typical queries are:

SELECT * FROM ENTITIES WHERE VALIDATION IS {NOT} NULL

Search for entities that are valid/invalid (NULL = no problem)

SELECT * FROM ENTITIES WHERE VALIDATION LIKE '%AXX:';

Selects entities with a specific problem

Then in version 1

I added a column VALID TINYINT NOT NULL with an index which is 0=invalid, 1=valid (Hibernate maps a Boolean to a TINYINT in MySQL).

I added a lookup table

CREATE TABLE ENTITY_VALIDATION (
    ID BIGINT NOT NULL PRIMARY KEY,
    PERSON_ID LONG NOT NULL, --REFERENCES PERSONS(ID) --Omitted for performance
    ERROR CHAR(3) NOT NULL
)

With index on both PERSON_ID and ERROR. This represents the 1:N relationship

Queries:

SELECT * FROM ENTITIES WHERE VALIDATION = {0|1}

Select invalid/valid entities

SELECT * FROM ENTITIES JOIN ENTITY_VALIDATION ON ENTITIES.ID = ENTITY_VALIDATION.PERSON_ID WHERE ERROR = 'Axx';

Selects entities with a given problem

Then I benchmarked

the count(*) function via JUnit+JDBC. So the same queries you see above replace * with COUNT(*).

I did several benchmarks, with entity table containing 100k, 250k, 500k, 750k, 1M entities with a mean ratio entity:flag of 1:3 (there are meanly 3 errors for each entity).

The result

is displayed below. While correct/incorrect entities lookup is equally performing, it looks like MySQL is faster in the LIKE operator rather than in a JOIN, even though there are indexes

Excel graph

Of course,

This was only a benchmark on MySQL. While the approach is cross-platform, the benchmark does not (yet) compare performance in different DBMSes

Storing flags in SQL column, and indexing them

Now the real problem

Then in version 1

Then in version 1

Then I benchmarked

The result

Of course,