How to make Indexed Views work for SQLCLR?

https://dba.stackexchange.com/questions/245811

07-02-2021
|

Question

In SQL Server, the indexed view is a hellscape of limitations. But I need one. I have a formatter SQLCLR function that creates a prettified version of a domain key - users want to be able to search on substrings of that prettified version. So I need a full-text index on a persisted calculated column or a materialized view.

However, the formatter depends on data stored in multiple tables.

So this works fine for a view, but not for a persisted computed column because those can't query from multiple tables.

My SQLCLR method is precise and deterministic, so it should be eligible for use in indexed views, but indexed views can't have SQLCLR in their index keys.

I could re-implement my formatter using the T-SQL FORMAT function... but FORMAT is also incompatible with indexed views.

Can indexed views do anything? Ever?

I'm open to the "roll your own materialized view with triggers" approach if anybody can refer me to a good document on best practices for doing that. Last time I tried, it spiraled out of control and there was tremendous duplication between the code for insert and update and the initialization.

Is there some way I'm missing to get performant text searches on calculated data on a few million rows, without using a materialized/indexed view or a persisted calculated column?

The formatter does not perform data access. However, the data I need to feed into the formatter for it to be useful will be coming from multiple tables (with a nice regular join), hence I can't solve this problem with persisted computed columns. I can't key the SQLCLR column, so I can't use it in a fulltext index.

I thought this would be simple. Persisted columns and indexed views are designed to perform calculations on write, and to properly implement the observer pattern so that changes to their dependencies are reflected on the calculated value.

Solution

Can indexed views do anything? Ever?

It actually is kind of miserable when you think of an indexed view as a solution, only to find your use case hits on one of the many limitations of indexed views (I'm looking at you, LEFT JOIN).

You're right that you can't include the SQLCLR in the index keys of the indexed view. However, you can include it in the SELECT list of the view definition, which will persist the value to disk. So you can at least avoid the cost of calculating the value on the fly when reading from the table.

In the AdventureWorks2014 sample database, I created this view on the delightfully named Person.Person table:

CREATE OR ALTER VIEW dbo.PersonWithHashForSomeReason
WITH SCHEMABINDING
AS
SELECT 
    BusinessEntityID,
    PersonType,
    dbo.SpookyHash(CONVERT(binary(50), FirstName)) AS FirstNameHash
FROM Person.Person
GO

Note: I was too lazy to write my own CLR function, so this one is from this Q&A.

I can make that an indexed view by clustering it on BusinessEntityID:

CREATE UNIQUE CLUSTERED INDEX CX_BusinessEntityID 
ON dbo.PersonWithHashForSomeReason (BusinessEntityID);
GO

Looking for a specific set of rows in the table produces an index scan (execution plan link). Notice the lack of a Compute Scalar operator, which would normally be used to produce the hash value. Since the hash is persisted to disk in the indexed view, that's not necessary:

SELECT BusinessEntityID, FirstNameHash 
FROM dbo.PersonWithHashForSomeReason WITH (NOEXPAND)
WHERE FirstNameHash = 0x910C426C533F2C0AAF350158331E3B01;

I had to use a NOEXPAND hint to get it to use the view.

You'll notice the warnings in the plan are due to the fact that the whole table was scanned in order to find those values, since there is no index keyed on FirstNameHash.

Unfortunately, trying to create a nonclustered index on this persisted value fails:

CREATE NONCLUSTERED INDEX IX_FirstNameHash 
ON dbo.PersonWithHashForSomeReason (FirstNameHash);

Msg 1976, Level 16, State 1, Line 48
Cannot create index or statistics 'IX_FirstNameHash' on view 'dbo.PersonWithHashForSomeReason' because cannot verify key column 'FirstNameHash' is precise and deterministic. Consider removing column from index or statistics key, marking column persisted in base table if it is computed, or using non-CLR-derived column in key.

This is because SQL Server doesn't trust us:

Why don’t we trust the user? What can really go wrong here??

Assume if the user incorrectly sets the custom attribute IsDeterminsitic to true for a non-deterministic function and assume he is able to create an index on a computed column that invoked this function without persisting it. [...] This might lead to index corruption as the function might return different values for the same input as the function is non-deterministic.

So for the sake of user’s safety, Sql Server in this release REQUIRES the user to persist the computed columns (unlike the tsql case) to actually index the computed columns.

I imagine the same limitation exists for fulltext indexes, although I haven't tried it.

The takeaway from all this, as you mentioned in your own post, is that you're kind of stuck here. Really your only option is to:

accept the table scan (not ideal, and probably not even practical based on your description), or
use triggers to "manually" keep this value up to date in a normal column

OTHER TIPS

So I need a full-text index on a persisted calculated column or a materialized view.

A full text index cannot be created on a non-deterministic column.

That error was produced when trying to full-text index the Search column defined in an indexed view as below:

CREATE TABLE dbo.T1 (id integer PRIMARY KEY, v1 datetime2(3) NULL);
CREATE TABLE dbo.T2 (id integer PRIMARY KEY, v2 decimal (9, 2) NULL);
CREATE TABLE dbo.T3 (id integer PRIMARY KEY, v3 money NULL);
GO
INSERT dbo.T1 (id, v1) VALUES (1, SYSUTCDATETIME());
INSERT dbo.T2 (id, v2) VALUES (1, 1234567.89);
INSERT dbo.T3 (id, v3) VALUES (1, $987.65);
GO
CREATE VIEW dbo.FormattedData
WITH SCHEMABINDING
AS
SELECT 
    T1.id,
    Search =
        N'Datetime: ' + SQL#.Date_Format(T1.v1, N'D', N'de-de') + N'; ' +
        N'Decimal: ' + SQL#.Math_FormatDecimal(T2.v2, N'###,###,###.00', N'') + N'; ' +
        N'Money: ' + SQL#.Math_FormatDecimal(T3.v3, N'C', N'de-de')
FROM dbo.T1 AS T1
JOIN dbo.T2 AS T2
    ON T2.id = T1.id
JOIN dbo.T3 AS T3
    ON T3.id = T2.id;

The functions referenced there are from the SQL# CLR library, and available in the free version. They match the function of the FORMAT function, which is not allowed in an indexed view at all.

Nevertheless, it seems unlikely that full-text search is what you need. It is not suitable for leading-wildcard searches or general string matching where non-words are involved. As the documentation says:

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

If you need fast leading-wildcard and/or non-language searches, and the T-SQL built-in functions do not perform well enough after testing, one alternative is to use n-grams, as I describe in Trigram Wildcard String Search in SQL Server.

That article contains a complete implementation, including simple triggers to keep a data source (e.g. an indexed view as above) synchronized with the n-grams at all times if that is required. If you do not need live synchronization, you may find that rebuilding the n-grams completely every so often is good enough for your users that the triggers are not needed (lightweight as they are).

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange