Database optimization vs readibility
https://softwareengineering.stackexchange.com/questions/350351
-
13-01-2021 - |
Pregunta
I'm working on some code that checks for some invariants in the database. For example to check that an element is not repeated the code checks for a database error since the invariant is coded at database level.
Basically to enforce uniqueness for a certain set of key the code checks for the error:
if (sqlError = "name_age") { // name+age are non unique cannot insert
throw error("no users with same name and age allowed")
}
To me it was a bit unreadable, mostly because it must be read together with the database configuration, furthermore, it cannot be tested without a mock or by hitting the db itself. What I would like to do is:
if (countWithNameAndAge(name, age) == 1) {
throw error("no users with same name and age allowed")
}
Not only this is more immediately readable but I can allow 2 or 3 people with same name and age without evolving the database or adding columns.
The only cons is that this may be slower.
Is it true that my approach may be slower? Is it indeed more legible?
Solución
Performance vs. Readability
Most likely, the single INSERT
will not be significantly faster than a cleverly coded LOCK
- FETCH
- INSERT
- UNLOCK
sequence, because the UNIQUE
index enforces the same operations, and the embedded INSERT
should be faster than the standalone INSERT
. But if performance is important to you, you should conduct a benchmark. Searching the index for existing records and inserting the data in the presence of the index (having to update the index) are the expensive operations here.
Maintainability
Comparing the SQL error variable against a string is a mainenance horror. It would be better if the INSERT
statement return code would already give an indication what went wrong, such that it can be evaluated. This would be my preferred approach (keeping it simple).
If that is not possible, comparing the error text to a string constant might be a compromise, as suggested by @TimothyTruckle.
Sticking with a single, potentially failing INSERT
also has the advantage of not distributing the implementation of the uniqueness requirement to several levels. Strictly speaking, with the single INSERT
, this essential invariant is guaranteed by the UNIQUE
index, and the separate error message is just being nice to the user.
Data Model
As @ErikEidt and @RSahu suggest, age
is not the best choice here. It should not be stored, and it is not suitable for checking duplicates most of the time.
Otros consejos
if (countWithNameAndAge(name, age) == 1)
has to go through each item in the data base to come up with a count. You could change that to:
if (hasNameAndAge(name, age))
This will return at the first occurrence of an item with the given name and age, and hence, will be a little bit faster. However, you lose the flexibility of being able to allow more than 3 people with the same name and age. If that flexibility is a must, you can tweak hasNameAndAge
to add a count also.
if (hasNameAndAge(name, age, MAX_COUNT))
That will return with true
as soon as it finds MAX_COUNT
number of items.
Having said that, I wonder. Should you even worry about the performance ramifications of detecting duplicates? What is the percentage of times that you expect you/your users to be attempting to add non-duplicates vs duplicates? If the percentage of times duplicates are attempted to be added is small, the performance issue of detecting duplicates is moot.
It seems to be that you need to optimize the hasNameAndAge
, no matter which version you choose, when you expect it to return false
if percentage of times duplicates are attempted to be added is small.