سؤال

I've got this tree-like key (think object OIDs--it's very similar) to store and index in a table, for which queries want to select subtrees thereof. The most straightforward way to do this on a btree is to use the tree path as the single key and descend the key with a starts-with operation to find the row blocks.

So the obvious representation is VARBINARY(n)*. I cannot think of a way to express starts with in a way that SQL Server can figure out to use the index thereof.

The next-most obvious is to pack the data into some VARCHAR with a binary collation and use a like operation; however I've discovered that like @n + '%' will never use an index, but only like 'literal%. Adding the WITH INDEX hint still scans the index in its entirety because SQL server just doesn't understand. I really don't want to think about how to escape arguments in SQL so I can build an EXEC. That's just madness and a security disaster waiting to happen.

Expressing this in ordinary recursive tables and querying with recursive SQL is out of the question. The recursive query will power-sink the database server.

I actually have the data in a long string of long variables right now and can select my serialization form. As I said, the natural form is VARBINARY and it really would look like 0x000000100004A010000000D0000A000. The most common query is of the form "give me everything starting with 0x000000100004A01" or in natural model, the first n values from the string of long values. I could write them in the form a.b.c.d.e.f... but the individual numbers are long and come from clicking on things on the GUI.

Roughly speaking I'm looking at a plausible form of

CREATE TABLE Record (
   RecordId BIGINT NOT NULL IDENTITY(1,1),
   RecordedDate DATETIME NOT NULL,
   RecordClass CHAR(1) NOT NULL,
   UserId INT NOT NULL,
   ObjectId VARBINARY(384) NOT NULL,
   RecordValue NVARCHAR(100) NULL,
   OwnerId BIGINT NULL, -- Joins to another table
   SubOwnerId BIGINT NULL, -- and yet another table
   PRIMARY KEY (RecordId)
)
CREATE INDEX name ON Record(RecordedDate);
CREATE INDEX name ON Record(OwnerId);
CREATE INDEX name ON Record(SubOwnerId);
CREATE INDEX name ON Record(ObjectId);

What is the best way to do a starts-with index?

*I've computed the largest possible n and it's less than 400

هل كانت مفيدة؟

المحلول

So as a matter of fact, starts with is not so hard to write as it would first seem. Given the initial key of 0x000000100004, obviously equals won't work and there is no starts with operation, but we can write ObjectId >= 0x000000100004 AND ObjectId < 0x000000100005.

By forcing the string of long values to VARBINARY conversion to be big-endian the natural form actually works correctly. We want >= (value0 & value1 & ... & lastvalue) AND < (value0 & value1 & ... & (lastvalue + 1)). As far as I know, the conversion to VARBINARY has to be written in application code, which is fine for me.

نصائح أخرى

So I'm going to go ahead and post another answer using HIERARCHYID because I can. We're not going to use it because the ORM doesn't like the type. Oh well.

Schema:

CREATE TABLE Record (
   RecordId BIGINT NOT NULL IDENTITY(1,1),
   RecordedDate DATETIME NOT NULL,
   RecordClass CHAR(1) NOT NULL,
   UserId INT NOT NULL,
   ObjectId HIERARCHYID NOT NULL,
   RecordValue NVARCHAR(100) NULL,
   OwnerId BIGINT NULL, -- Joins to another table
   SubOwnerId BIGINT NULL, -- and yet another table
   PRIMARY KEY (RecordId)
)
CREATE INDEX name ON Record(RecordedDate);
CREATE INDEX name ON Record(OwnerId);
CREATE INDEX name ON Record(SubOwnerId);
CREATE INDEX name ON Record(ObjectId);

The startswith query is not easily guessed but quite trivial:

SELECT RecordId, RecordedDate, RecordClass, UserId, UserId, CAST(ObjectId AS VARCHAR(800)) ObjectId, RecordValue, OwnerId, SubOwnerId
  FROM Record
 WHERE Record.ObjectId.IsDescendantOf(CAST('/id/id/id/id/' AS HIERARCHYID))

I tested this, and it does indeed handle our worst-case prediction of 40 bit integer components, and is at least deep enough. The optimizer knows how to use the index just fine.

When you use local variables SQL server doesn't take into account their values because they are unknown at the moment whe it generates execution plan for the query and it builds a plan optimized for UNKNOWN value. This is the reason why it choose scan. You can use dynamic SQL or stored procedure to make SQL Server able to optimize execution plan for particular value. For example:

DECLARE @LastName nvarchar(50)
DECLARE @sql nvarchar(4000);
SET @LastName = 'A%'
SET @sql = '
SELECT *
FROM Person.Person
WHERE LastName LIKE @LastName';
EXEC sp_executesql @sql, N'@LastName nvarchar(50)', @LastName
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top