Question

I'm storing localized strings in a single datatable using MS Sql (2008 or whatever). Most of the strings are short and can be represented with varchar(200), while about 10% are much longer require something like varchar(5000). My question is, is there a performance advantage when retrieving shorter strings if I break this into two tables like this:

CREATE TABLE ShortTextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(200))
CREATE TABLE LongTextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(4000))

Versus:

CREATE TABLE TextTable(ID bigint IDENTITY(1,1) NOT NULL, TextValue nvarchar(4000))

This data will rarely be updated, I'm only really concerned about reads.

Was it helpful?

Solution

It depends. Could be premature optimization.

With smaller columns, you will fit more rows per page, obviously, but your usage patterns may mean that the horizontal partition you are proposing is not very efficient, since it's getting things from both new tables. I think we need to see the read usage pattern and how the table is joined to.

Also, it is partitioning a space that is logically one space, and will no longer be manageable as one space (i.e. add an index in both places, etc.)

You'd really have to see a bottle neck and profile the proposed change before I would partition it like this.

I'm not sure, but it might be possible to literally partition (using SQL Server's partitioned tables feature) the table based on length of the column. Again, whether this would help would need to be profiled.

OTHER TIPS

No, no real gain. To see bottlenecks because of the string size interleaves, specially base don an int PK, it would be a real extreme.
On the other hand the mess of working with such a storage schema is very clear and present: you have to decide, based on the length of a string you did not yet retrieve on which table to look! You'll probably end up looking by trial and error (try one table, then the other), which is by far more wasteful than that any table nvarchar storage structure issue.

In SQL 2005 and I believe 2008 you will not create an NVarChar(5000) since you exceed the page size with such a data type, NVarChar(Max) would work at that point. When specifying a number N for nVarChar you have a limit of up to 4000.

I believe that at that point there will be a performance differece between reading an inline stored value to the page, vs reading the page to get the 16 byte pointer to the LOB pages and reading the data from there.

No or negative gain,

Storage wise: A variable length string is stored as number of chars + 2 bytes for length. So: length of data is the same but you'd have index and key overhead of the 2nd table.

Processing wise:

  • deciding what table to add it to
  • correcting a typo means it's in the wrong table (ignoring forward ptrs etc)
  • dealing with key uniqueness for 2 tables (if they'd have a common parent)

Now, more importantly I saw you mentioned localisation but do you need nvarchar? Another SO question: varchar vs nvarchar performance

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top