Sql Max value from multiple columns in Computed Column

https://dba.stackexchange.com/questions/225895

19-01-2021
|

문제

In this article solution 1, it talks about finding the maximum value from many columns. I would like to conduct this in a computed/persisted column. How would I do this?

https://www.mssqltips.com/sqlservertip/4067/find-max-value-from-multiple-columns-in-a-sql-server-table/

create table dbo.TestAmount
(
    Amount1 int,
    Amount2 int,
    Amount3 int,
    MaxValuedata as (select MAX(MaxAmount) FROM (VALUES (Amount1),(Amount2),(Amount3)) AS MaxAmount(LastAmount)) 
)

May have 10 values in the future, trying to prevent long case statement.

해결책

First thing... I'm not advocating THAT you do this... I'm simply showing you HOW to do this. If your table experiences high volume inserts and/or updates, you could see a noticeable performance hit. Using scalar UDFs in computed columns will force all queries against the table to run serially.

Start by creating a scalar function similar to the following...

CREATE FUNCTION dbo.GreatestOfThreeInts
/* ================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test.
================================================================================================ */
(
    @C1 INT,
    @C2 INT,
    @C3 INT
)
RETURNS INT WITH SCHEMABINDING --, RETURNS NULL ON NULL INPUT --<< use this if NULLs are a possibility..
AS 
BEGIN 
    DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
    RETURN ISNULL(@MaxVal, 0);
END;
GO

Either create your table with a PERSISTED computed column or, if the table already exists, use the ALTER / ADD syntax to add the PERSISTED computed column...

CREATE TABLE dbo.Test (
    C1 INT NOT NULL,
    C2 INT NOT NULL,
    c3 INT NOT NULL,
    MaxVal AS dbo.GreatestOfThreeInts(C1, C2, C3) PERSISTED NOT NULL    -- persist the value so that it doesn't need to be constantly recomputed
    );
GO

CREATE NONCLUSTERED INDEX ix_Test_MaxVal ON dbo.Test (MaxVal) INCLUDE (C1, C2, c3);
GO

Why do I keep saying PERSISTED?... Buy once, cry once... Unless you have a very write heavy usage pattern, you'll be better off computing the values on inserts & updates, than every time you reference the column in a select... Especially if that column is going to be used in a predicate or sorting operation.

Sooo... Let's see it in action...

INSERT dbo.Test (C1, C2, c3) VALUES
    (123,456,789),
    (345,478,123),
    (523,321,852),
    (111,471,951),
    (874,320,357),
    (965,102,478);
GO 

SELECT * FROM dbo.Test t ORDER BY t.MaxVal OPTION(QUERYTRACEON 176);
GO 

SELECT * FROM dbo.Test t WHERE t.MaxVal >= 800 AND t.MaxVal < 900 OPTION(QUERYTRACEON 176);
GO

Results...

C1          C2          c3          MaxVal
----------- ----------- ----------- -----------
345         478         123         478
123         456         789         789
523         321         852         852
874         320         357         874
111         471         951         951
965         102         478         965


C1          C2          c3          MaxVal
----------- ----------- ----------- -----------
523         321         852         852
874         320         357         874

Hope this helps, Jason

Edit #1: A BIG THANK YOU to Erik for adding the link, pointing out the fact that using a scalar UDF to compute a column will prevent the optimizer from considering a parallel execution plan... Even when the computed column is persisted. A fact that I actually knew but completely omitted from my initial answer. What I didn't know is the OPTION(QUERYTRACEON 176) thing... Picking up that little nugget, more than covered the cost of admission for me!

Edit #2: Without inviting the religious debate of "NULL vs NOT NULL" column constraints, I'll simply state that my personal "default" is make all columns NOT NULL unless there is a compelling reason to do otherwise... That said, @MartinSmith makes some good points... Including the fact that the OP, by not specifying NULLability, made all columns NULLable. Plus, after the back & forth, I was just curious to see if the RETURN ISNULL(@MaxVal, 0); was doing anything other than irritating people reading the T_SQL... Short answer... It does not.

The following includes the introduction of a "control" table (no computed column) and NULLable versions of dbo.GreatestOfThreeInts & dbo.Test (dbo.GreatestOfThreeInts_2 & dbo.Test_2)

CREATE FUNCTION dbo.GreatestOfThreeInts_2
/* ==================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test_2.
================================================================================================== */
(
    @C1 INT,
    @C2 INT,
    @C3 INT
)
RETURNS INT WITH SCHEMABINDING
AS 
BEGIN 
    DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
    RETURN @MaxVal;
END;
GO

CREATE TABLE dbo.Test_2 (
    C1 INT NULL,
    C2 INT NULL,
    c3 INT NULL,
    MaxVal AS dbo.GreatestOfThreeInts_2(C1, C2, C3) PERSISTED   -- persist the value so that it doesn't need to be constantly recomputed
    );
GO

CREATE NONCLUSTERED INDEX ix_Test2_MaxVal ON dbo.Test_2 (MaxVal) INCLUDE (C1, C2, c3);
GO

CREATE TABLE dbo.Control (
    C1 int NOT NULL,
    C2 int NOT NULL,
    c3 int NOT NULL,
    MaxVal INT NOT NULL
    );
GO

CREATE NONCLUSTERED INDEX ix_Control_MaxVal ON dbo.Control (MaxVal) INCLUDE (C1, C2, c3);
GO

And because the 6 rows in my original answer isn't much of a test, the following will load all 3 tables with 1 million rows of test data...

-- clear out any existing data...
TRUNCATE TABLE dbo.Test;
GO 
TRUNCATE TABLE dbo.Test_2;
GO 
TRUNCATE TABLE dbo.Control;
GO 

-- add 1M rows of test data...
WITH 
    cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), 
    cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
    cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
    cte_Tally (c1, c2, c3) AS (
        SELECT TOP (1000000)
            ABS(CHECKSUM(NEWID())) % 9000 + 1000,   -- randomly generate INTs between 1000 and 9999
            ABS(CHECKSUM(NEWID())) % 9000 + 1000,   -- no, I don't have an actual reason for using that specific range...
            ABS(CHECKSUM(NEWID())) % 9000 + 1000    -- 
        FROM
            cte_n3 a CROSS JOIN cte_n3 b
        )
INSERT dbo.Test (C1, C2, c3)
SELECT 
    t.c1, 
    t.c2, 
    t.c3
FROM
    cte_Tally t;
GO

-- use dbo.Test to insert dbo.Test_2 & dbo.Control so all 3 tables will have the exact same data values...
-- (to compare actual insert performance, use the cte_Tally to load all tables)

INSERT dbo.Test_2 (C1, C2, c3)
SELECT 
    t.C1, t.C2, t.c3
FROM
    dbo.Test t;
GO

INSERT dbo.Control (C1, C2, c3, MaxVal)
SELECT 
    t.C1, t.C2, t.c3, t.MaxVal
FROM
    dbo.Test t;
GO

다른 팁

Here is how to do it without having to use a scalar UDF. This avoids the pitfalls that Jason Long mentions in his answer (which was very well written, btw). The tradeoff here is that the SQL is not as readable.

Here are the pros and cons of my method.

Pros:

no new UDFs created
simplifies index creation
- simpler index declaration SQL
- able to create index directly on the computed column, instead of on the other three
- decreased size of index on disk (consists of only one column)
SQL Server recognizes the computed column as both precise and deterministic (required to create a index directly on the column)
able to be modified to handle NULLs
can be modified to provide other SQL Server functions, such as MIN, AVG, SUM, COUNT, etc...

Cons:

decreased readability of SQL
- not as clean looking (in some ways) as using the MAX() function directly
- three column version is shown below, but the statement gets progressively longer (exponentially) the more columns that are added (as opposed to growing linearly, if MAX() is used instead)
  - only the MAXVAL SQL is longer. The actual speed of computation remains O(1) with either this method or using the MAX() function

Here is the NOT NULL version, keeping the same variable names as Jason:

    CREATE TABLE dbo.Test (
        C1 INT NOT NULL,
        C2 INT NOT NULL,
        c3 INT NOT NULL,
        MaxVal AS IIF(IIF([C1]>[C2],[C1],[C2])>[C3],IIF([C1]>[C2],[C1],[C2]),[C3]) PERSISTED NOT NULL
        );
    GO

    CREATE NONCLUSTERED INDEX ix_Test_MaxVal ON dbo.Test (MaxVal);
    GO

Here is the modified version to be able to handle NULL columns.

    CREATE TABLE dbo.Test_2 (
        C1 INT NULL,
        C2 INT NULL,
        C3 INT NULL,
        MaxVal AS IIF(IIF([C1]>[C2]OR[C2]IS NULL,[C1],[C2])>[C3]OR[C3]IS NULL,IIF([C1]>[C2]OR[C2]IS NULL,[C1],[C2]),[C3]) PERSISTED
        );
    GO

    CREATE NONCLUSTERED INDEX ix_Test_2_MaxVal ON dbo.Test_2 (MaxVal);
    GO

And test code to demonstrate correctness. For more extensive test code, please see Jason's answer.

INSERT INTO [dbo].[Test_2]
           ([C1]
           ,[C2]
           ,[C3])
     VALUES
    (123,456,789),
    (345,478,123),
    (NULL,321,852),
    (111,NULL,951),
    (874,320,NULL),
    (NULL,NULL,NULL);
GO 

SELECT * FROM dbo.Test_2 t ORDER BY t.MaxVal OPTION(QUERYTRACEON 176);
GO

Output:

C1       C2      C3     MaxVal

NULL    NULL    NULL    NULL

345     478     123     478

123     456     789     789

NULL    321     852     852

874     320     NULL    874

111     NULL    951     951

IMO, Computed column in such scenario is bad bad idea in any case.

I don't know how you will be using it in real life.

Computing max value out of many columns in UI layer like c# is good.

Or you can store the max value in normal column like others.

You can also Create Trigger After Insert ,Update to do so.

in my example Table , I don't know how many columns are there ?

I am finding max value across all numeric columns.So in future if anybody add new column which happen to numeric type then max value will auto calculated.

DECLARE @List varchar(500)

SET @List = stuff((
SELECT ',' + QUOTENAME(COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'TABLE1'
AND DATA_TYPE IN (
'int','smallint','float')
FOR XML PATH('')
,Type
).value('.', 'varchar(max)'), 1, 1, '')

--select @List
DECLARE @Sql NVARCHAR(3000)
DECLARE @MaxValue DECIMAL(15, 5)

 CREATE TABLE #temp (
col VARCHAR(100)
,value DECIMAL(15, 5))

SET @Sql = N'select col,value 
from 
(select * from TABLE1)P
UNPIVOT(value for col in (' + @List + ')) as unpvt'


 INSERT INTO #temp
EXEC sp_executesql @Sql

SELECT @MaxValue = max(value)
FROM #temp

--update column here set column=@MaxValue

DROP TABLE #temp

Finally it all depend on real life scenario and how and where you are going to use it.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange