Indexing Referenced Table Columns
-
09-03-2021 - |
Question
I would like to do a search across columns of joined tables. What would the best index configuration be for the query below?
Table A
------------
Id
ValueA
-- Table A has many more columns
Table B
------------
Id
AId
ValueB
-- Table B has many more columns
At the moment, I have clustered indexes on the Primary Keys and non-clustered indexes (which don't include any columns) for columns AId
, ValueA
, and ValueB
.
SELECT *
FROM TableA a
INNER JOIN TableB b ON a.Id = b.AId
WHERE
a.ValueA LIKE 'SearchTerm%' OR
b.ValueB LIKE 'SearchTerm%'
Update Table A is likely to run into the low millions while Table B will only have low tens of thousands of entries.
In the real world Table A will be a list of transactions, where the value is some sort of a transaction reference. Table B will be a list of users where the value could be anything such as a name or any other descriptive information.
At the moment we use an ORM which will essentially do a SELECT *
. Naturally this could be changed, but there would be a fairly large impact on the application side.
Solution
Your query is a tricky one for the optimizer, for two reasons.
- The optimizer doesn't currently have logic to transform a disjunction (OR) across tables into a union (related Q & A).
- The optimizer doesn't reason about deferring lookups (dealing with keys only until the last moment). This relates to the
SELECT *
component of the given query.
Together, these limitations mean there is very little indexing can do to assist the query as written (ORM generated), leaving aside the possibility of a (nonclustered) column store index.
If performance is inadequate, you may have to bite the bullet and write a custom query. In that case, modifying the existing index on ValueB
to include AId
would be useful, coupled with a rewrite like the following:
SELECT TA.*, TB.*
FROM
(
-- Disjunction as union
SELECT A.Id, b.Id
FROM TableA a
INNER JOIN TableB b
ON b.AId = a.Id
WHERE
a.ValueA LIKE 'SearchTerm%'
UNION
SELECT A.Id, b.Id
FROM TableB b
INNER JOIN TableA a
ON a.Id = b.AId
WHERE
b.ValueB LIKE 'SearchTerm%'
) AS M (A_Id, B_Id)
-- Lookups
JOIN dbo.TableA AS TA ON TA.Id = M.A_Id
JOIN dbo.TableB AS TB ON TB.Id = M.B_Id;
The target plan shape is:
How much better this is in practice depends on several factors, including how selective the LIKE
predicates are (and that they are always prefix searches).
Guessed schema:
DROP TABLE IF EXISTS dbo.TableB, dbo.TableA;
CREATE TABLE dbo.TableA
(
Id integer NOT NULL PRIMARY KEY,
ValueA varchar(256) NULL INDEX iValueA,
Padding char(4000) NOT NULL DEFAULT '', -- other columns
/*INDEX i (ValueA) INCLUDE (Id)*/
);
CREATE TABLE dbo.TableB
(
Id integer NOT NULL PRIMARY KEY,
AId integer NOT NULL REFERENCES dbo.TableA (Id) INDEX iAId,
ValueB varchar(256) NULL /*INDEX iValueB*/,
Padding char(2000) NOT NULL DEFAULT '', -- other columns
INDEX i (ValueB) INCLUDE (AId)
);
UPDATE STATISTICS dbo.TableA WITH ROWCOUNT = 3000000, PAGECOUNT = 3000000;
UPDATE STATISTICS dbo.TableB WITH ROWCOUNT = 30000, PAGECOUNT = 30000;