Database optimization vs readibility

https://softwareengineering.stackexchange.com/questions/350351

13-01-2021
|

Pregunta

I'm working on some code that checks for some invariants in the database. For example to check that an element is not repeated the code checks for a database error since the invariant is coded at database level.

Basically to enforce uniqueness for a certain set of key the code checks for the error:

if (sqlError = "name_age") { // name+age are non unique cannot insert
  throw error("no users with same name and age allowed")
}

To me it was a bit unreadable, mostly because it must be read together with the database configuration, furthermore, it cannot be tested without a mock or by hitting the db itself. What I would like to do is:

if (countWithNameAndAge(name, age) == 1) {
  throw error("no users with same name and age allowed")
}

Not only this is more immediately readable but I can allow 2 or 3 people with same name and age without evolving the database or adding columns.

The only cons is that this may be slower.

Is it true that my approach may be slower? Is it indeed more legible?

Solución

Performance vs. Readability

Most likely, the single INSERT will not be significantly faster than a cleverly coded LOCK - FETCH - INSERT - UNLOCK sequence, because the UNIQUE index enforces the same operations, and the embedded INSERT should be faster than the standalone INSERT. But if performance is important to you, you should conduct a benchmark. Searching the index for existing records and inserting the data in the presence of the index (having to update the index) are the expensive operations here.

Maintainability

Comparing the SQL error variable against a string is a mainenance horror. It would be better if the INSERT statement return code would already give an indication what went wrong, such that it can be evaluated. This would be my preferred approach (keeping it simple).

If that is not possible, comparing the error text to a string constant might be a compromise, as suggested by @TimothyTruckle.

Sticking with a single, potentially failing INSERT also has the advantage of not distributing the implementation of the uniqueness requirement to several levels. Strictly speaking, with the single INSERT, this essential invariant is guaranteed by the UNIQUE index, and the separate error message is just being nice to the user.

Data Model

As @ErikEidt and @RSahu suggest, age is not the best choice here. It should not be stored, and it is not suitable for checking duplicates most of the time.

Otros consejos

if (countWithNameAndAge(name, age) == 1)

has to go through each item in the data base to come up with a count. You could change that to:

if (hasNameAndAge(name, age))

This will return at the first occurrence of an item with the given name and age, and hence, will be a little bit faster. However, you lose the flexibility of being able to allow more than 3 people with the same name and age. If that flexibility is a must, you can tweak hasNameAndAge to add a count also.

if (hasNameAndAge(name, age, MAX_COUNT))

That will return with true as soon as it finds MAX_COUNT number of items.

Having said that, I wonder. Should you even worry about the performance ramifications of detecting duplicates? What is the percentage of times that you expect you/your users to be attempting to add non-duplicates vs duplicates? If the percentage of times duplicates are attempted to be added is small, the performance issue of detecting duplicates is moot.

It seems to be that you need to optimize the hasNameAndAge, no matter which version you choose, when you expect it to return false if percentage of times duplicates are attempted to be added is small.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a softwareengineering.stackexchange