Does separating VARCHAR values based on size have a performance gain?

https://dba.stackexchange.com/questions/1126

16-10-2019
|

Question

I am working with a team that is trying to implement an EAV system. They have decided to split the attribute-value tables out by type and they are debating using different tables for different size ranges of varchar.

EX:

table_1 - up to varchar(10)
table_2 - varchar(11) TO varchar(500)
table_3 - varchar(501) TO varchar(MAX)

I have always been under the impression that varchar was only going to use the size that it needed.

Do you know if this is going to have any gains in performance and would it be worth the extra coding / logic that would be needed?

Solution

My gut says that any performance gain you get is unlikely to be worth the extra hassle (and potential for bugs ) resulting from needing to enforce the separation and perform multiple lookups in your application logic.

If you have a lot of small values and were only querying them and none of the rest you would see some performance gain as more rows would fit in each page so overall less pages will need to be processed in RAM or read from disk for a given query. As soon as you need all the properties in one go (or just a mix) this benefit is going to be blown out of the water by either needing to query multiple tables separately or via a UNION.

The only way to be sure of course is to rig up a reasonably realistic large dataset and run some performance tests against the arrangements you are considering. But I very much doubt you will see any change worth the extra complexity. If you data can be split in more logical ways (i.e. ways implied by your business logic) than I suggest you look into data partitioning, particularly if you can split the partitions over different drives. Whenever you find yourself contemplating a potentially complexity laden optimisation (including partitioning) always go back and reconsider your overall data structure and make sure it is not working against your business logic and check your hardware is sufficient for the load you expect - while it is not guaranteed you are likely to find much more significant gains through looking into these core areas instead.

OTHER TIPS

You'll have no gains in performance at all.

Quick thoughts, not at all an exhaustive analysis:

At some point you'll need to UNION these to get a single view and then everything becomes varchar(max)
How do you decide length up front?
Indexing for searching values? You can't index > 900 bytes
Rolling your own "unique" constraints in an EAV would be bad enough, without split over multiple tables

Look for EAV anti pattern: there are several articles on how avoid EAV

It sounds like they're trying to optimize the EAV for lookups. However, this clearly sounds like they're not trying to optimize a system for profiled deficiencies, but instead they're trying to optimize via voodoo guesses.

Remind them that the first rule of optimization is profiling, so like David Spillett said, until you have a couple hundred million rows in the EAV (considering most entities I know of are at least 15 attributes en-toto so you would only get like a few tens of millions of entities) and then profile you can't know that this will have any impact.

I'm gonna go with "no, this will not benefit like they think it will" and that the better partitioning would probably be something on the order of 50 chars and 100 chars rather than 10 and 500. But that's just a guess.

But notice that it WILL have the effect they want in that it will allow better index performance (all data partitioning should give better indexing performance over non-partitioning as a general rule of thumb)

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange