Sql Max value from multiple columns in Computed Column
-
19-01-2021 - |
문제
In this article solution 1, it talks about finding the maximum value from many columns. I would like to conduct this in a computed/persisted column. How would I do this?
create table dbo.TestAmount
(
Amount1 int,
Amount2 int,
Amount3 int,
MaxValuedata as (select MAX(MaxAmount) FROM (VALUES (Amount1),(Amount2),(Amount3)) AS MaxAmount(LastAmount))
)
May have 10 values in the future, trying to prevent long case statement.
해결책
First thing... I'm not advocating THAT you do this... I'm simply showing you HOW to do this. If your table experiences high volume inserts and/or updates, you could see a noticeable performance hit. Using scalar UDFs in computed columns will force all queries against the table to run serially.
Start by creating a scalar function similar to the following...
CREATE FUNCTION dbo.GreatestOfThreeInts
/* ================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test.
================================================================================================ */
(
@C1 INT,
@C2 INT,
@C3 INT
)
RETURNS INT WITH SCHEMABINDING --, RETURNS NULL ON NULL INPUT --<< use this if NULLs are a possibility..
AS
BEGIN
DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
RETURN ISNULL(@MaxVal, 0);
END;
GO
Either create your table with a PERSISTED computed column or, if the table already exists, use the ALTER / ADD syntax to add the PERSISTED computed column...
CREATE TABLE dbo.Test (
C1 INT NOT NULL,
C2 INT NOT NULL,
c3 INT NOT NULL,
MaxVal AS dbo.GreatestOfThreeInts(C1, C2, C3) PERSISTED NOT NULL -- persist the value so that it doesn't need to be constantly recomputed
);
GO
CREATE NONCLUSTERED INDEX ix_Test_MaxVal ON dbo.Test (MaxVal) INCLUDE (C1, C2, c3);
GO
Why do I keep saying PERSISTED?... Buy once, cry once... Unless you have a very write heavy usage pattern, you'll be better off computing the values on inserts & updates, than every time you reference the column in a select... Especially if that column is going to be used in a predicate or sorting operation.
Sooo... Let's see it in action...
INSERT dbo.Test (C1, C2, c3) VALUES
(123,456,789),
(345,478,123),
(523,321,852),
(111,471,951),
(874,320,357),
(965,102,478);
GO
SELECT * FROM dbo.Test t ORDER BY t.MaxVal OPTION(QUERYTRACEON 176);
GO
SELECT * FROM dbo.Test t WHERE t.MaxVal >= 800 AND t.MaxVal < 900 OPTION(QUERYTRACEON 176);
GO
Results...
C1 C2 c3 MaxVal
----------- ----------- ----------- -----------
345 478 123 478
123 456 789 789
523 321 852 852
874 320 357 874
111 471 951 951
965 102 478 965
C1 C2 c3 MaxVal
----------- ----------- ----------- -----------
523 321 852 852
874 320 357 874
Hope this helps, Jason
Edit #1: A BIG THANK YOU to Erik for adding the link, pointing out the fact that using a scalar UDF to compute a column will prevent the optimizer from considering a parallel execution plan... Even when the computed column is persisted. A fact that I actually knew but completely omitted from my initial answer. What I didn't know is the OPTION(QUERYTRACEON 176)
thing... Picking up that little nugget, more than covered the cost of admission for me!
Edit #2: Without inviting the religious debate of "NULL vs NOT NULL" column constraints, I'll simply state that my personal "default" is make all columns NOT NULL unless there is a compelling reason to do otherwise... That said, @MartinSmith makes some good points... Including the fact that the OP, by not specifying NULLability, made all columns NULLable. Plus, after the back & forth, I was just curious to see if the RETURN ISNULL(@MaxVal, 0);
was doing anything other than irritating people reading the T_SQL...
Short answer... It does not.
The following includes the introduction of a "control" table (no computed column) and NULLable versions of dbo.GreatestOfThreeInts & dbo.Test (dbo.GreatestOfThreeInts_2 & dbo.Test_2)
CREATE FUNCTION dbo.GreatestOfThreeInts_2
/* ==================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test_2.
================================================================================================== */
(
@C1 INT,
@C2 INT,
@C3 INT
)
RETURNS INT WITH SCHEMABINDING
AS
BEGIN
DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
RETURN @MaxVal;
END;
GO
CREATE TABLE dbo.Test_2 (
C1 INT NULL,
C2 INT NULL,
c3 INT NULL,
MaxVal AS dbo.GreatestOfThreeInts_2(C1, C2, C3) PERSISTED -- persist the value so that it doesn't need to be constantly recomputed
);
GO
CREATE NONCLUSTERED INDEX ix_Test2_MaxVal ON dbo.Test_2 (MaxVal) INCLUDE (C1, C2, c3);
GO
CREATE TABLE dbo.Control (
C1 int NOT NULL,
C2 int NOT NULL,
c3 int NOT NULL,
MaxVal INT NOT NULL
);
GO
CREATE NONCLUSTERED INDEX ix_Control_MaxVal ON dbo.Control (MaxVal) INCLUDE (C1, C2, c3);
GO
And because the 6 rows in my original answer isn't much of a test, the following will load all 3 tables with 1 million rows of test data...
-- clear out any existing data...
TRUNCATE TABLE dbo.Test;
GO
TRUNCATE TABLE dbo.Test_2;
GO
TRUNCATE TABLE dbo.Control;
GO
-- add 1M rows of test data...
WITH
cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
cte_Tally (c1, c2, c3) AS (
SELECT TOP (1000000)
ABS(CHECKSUM(NEWID())) % 9000 + 1000, -- randomly generate INTs between 1000 and 9999
ABS(CHECKSUM(NEWID())) % 9000 + 1000, -- no, I don't have an actual reason for using that specific range...
ABS(CHECKSUM(NEWID())) % 9000 + 1000 --
FROM
cte_n3 a CROSS JOIN cte_n3 b
)
INSERT dbo.Test (C1, C2, c3)
SELECT
t.c1,
t.c2,
t.c3
FROM
cte_Tally t;
GO
-- use dbo.Test to insert dbo.Test_2 & dbo.Control so all 3 tables will have the exact same data values...
-- (to compare actual insert performance, use the cte_Tally to load all tables)
INSERT dbo.Test_2 (C1, C2, c3)
SELECT
t.C1, t.C2, t.c3
FROM
dbo.Test t;
GO
INSERT dbo.Control (C1, C2, c3, MaxVal)
SELECT
t.C1, t.C2, t.c3, t.MaxVal
FROM
dbo.Test t;
GO
다른 팁
Here is how to do it without having to use a scalar UDF. This avoids the pitfalls that Jason Long mentions in his answer (which was very well written, btw). The tradeoff here is that the SQL is not as readable.
Here are the pros and cons of my method.
Pros:
- no new UDFs created
- simplifies index creation
- simpler index declaration SQL
- able to create index directly on the computed column, instead of on the other three
- decreased size of index on disk (consists of only one column)
- SQL Server recognizes the computed column as both precise and deterministic (required to create a index directly on the column)
- able to be modified to handle
NULL
s - can be modified to provide other SQL Server functions, such as MIN, AVG, SUM, COUNT, etc...
Cons:
- decreased readability of SQL
- not as clean looking (in some ways) as using the
MAX()
function directly - three column version is shown below, but the statement gets progressively longer (exponentially) the more columns that are added (as opposed to growing linearly, if
MAX()
is used instead)- only the MAXVAL SQL is longer. The actual speed of computation remains O(1) with either this method or using the
MAX()
function
- only the MAXVAL SQL is longer. The actual speed of computation remains O(1) with either this method or using the
- not as clean looking (in some ways) as using the
Here is the NOT NULL
version, keeping the same variable names as Jason:
CREATE TABLE dbo.Test (
C1 INT NOT NULL,
C2 INT NOT NULL,
c3 INT NOT NULL,
MaxVal AS IIF(IIF([C1]>[C2],[C1],[C2])>[C3],IIF([C1]>[C2],[C1],[C2]),[C3]) PERSISTED NOT NULL
);
GO
CREATE NONCLUSTERED INDEX ix_Test_MaxVal ON dbo.Test (MaxVal);
GO
Here is the modified version to be able to handle NULL
columns.
CREATE TABLE dbo.Test_2 (
C1 INT NULL,
C2 INT NULL,
C3 INT NULL,
MaxVal AS IIF(IIF([C1]>[C2]OR[C2]IS NULL,[C1],[C2])>[C3]OR[C3]IS NULL,IIF([C1]>[C2]OR[C2]IS NULL,[C1],[C2]),[C3]) PERSISTED
);
GO
CREATE NONCLUSTERED INDEX ix_Test_2_MaxVal ON dbo.Test_2 (MaxVal);
GO
And test code to demonstrate correctness. For more extensive test code, please see Jason's answer.
INSERT INTO [dbo].[Test_2]
([C1]
,[C2]
,[C3])
VALUES
(123,456,789),
(345,478,123),
(NULL,321,852),
(111,NULL,951),
(874,320,NULL),
(NULL,NULL,NULL);
GO
SELECT * FROM dbo.Test_2 t ORDER BY t.MaxVal OPTION(QUERYTRACEON 176);
GO
Output:
C1 C2 C3 MaxVal
NULL NULL NULL NULL
345 478 123 478
123 456 789 789
NULL 321 852 852
874 320 NULL 874
111 NULL 951 951
IMO, Computed column in such scenario is bad bad idea in any case.
I don't know how you will be using it in real life.
Computing max value out of many columns in UI layer like c# is good.
Or you can store the max value in normal column like others.
You can also Create Trigger After Insert ,Update
to do so.
in my example Table , I don't know how many columns are there ?
I am finding max value across all numeric columns.So in future if anybody add new column which happen to numeric type then max value will auto calculated.
DECLARE @List varchar(500)
SET @List = stuff((
SELECT ',' + QUOTENAME(COLUMN_NAME)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'TABLE1'
AND DATA_TYPE IN (
'int','smallint','float')
FOR XML PATH('')
,Type
).value('.', 'varchar(max)'), 1, 1, '')
--select @List
DECLARE @Sql NVARCHAR(3000)
DECLARE @MaxValue DECIMAL(15, 5)
CREATE TABLE #temp (
col VARCHAR(100)
,value DECIMAL(15, 5))
SET @Sql = N'select col,value
from
(select * from TABLE1)P
UNPIVOT(value for col in (' + @List + ')) as unpvt'
INSERT INTO #temp
EXEC sp_executesql @Sql
SELECT @MaxValue = max(value)
FROM #temp
--update column here set column=@MaxValue
DROP TABLE #temp
Finally it all depend on real life scenario and how and where you are going to use it.