Wow, this is a real cool example of how to use nested CTE's in a In Line Table Value Function. You want to use a ITVF since they are fast. See Wayne Sheffield’s blog article that attests to this fact.
I always start with a sample database/table if it is really complicated to make sure I give the user a correct solution.
Lets create a database named [test] based on model.
--
-- Create a simple db
--
-- use master
use master;
go
-- delete existing databases
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Test')
DROP DATABASE Test
GO
-- simple db based on model
create database Test;
go
-- switch to new db
use [Test];
go
Lets create a table type named [InputToLinearReg].
--
-- Create table type to pass data
--
-- Delete the existing table type
IF EXISTS (SELECT * FROM sys.systypes WHERE name = 'InputToLinearReg')
DROP TYPE dbo.InputToLinearReg
GO
-- Create the table type
CREATE TYPE InputToLinearReg AS TABLE
(
portfolio_cd char(1),
month_num int,
collections_amt money
);
go
Okay, here is the multi-layered SELECT statement that uses CTE's. The query analyzer treats this as a SQL statement which can be executed in parallel versus a regular function that can't. See the black box section of Wayne's article.
--
-- Create in line table value function (fast)
--
-- Remove if it exists
IF OBJECT_ID('CalculateLinearReg') > 0
DROP FUNCTION CalculateLinearReg
GO
-- Create the function
CREATE FUNCTION CalculateLinearReg
(
@ParmInTable AS dbo.InputToLinearReg READONLY
)
RETURNS TABLE
AS
RETURN
(
WITH cteRawData as
(
SELECT
T.portfolio_cd,
CAST(T.month_num as decimal(18, 6)) as x,
LOG(CAST(T.collections_amt as decimal(18, 6))) as y
FROM
@ParmInTable as T
),
cteAvgByPortfolio as
(
SELECT
portfolio_cd,
AVG(x) as xavg,
AVG(y) as yavg
FROM
cteRawData
GROUP BY
portfolio_cd
),
cteSlopeByPortfolio as
(
SELECT
R.portfolio_cd,
SUM((R.x - A.xavg) * (R.y - A.yavg)) / SUM(POWER(R.x - A.xavg, 2)) as slope
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio A
ON
R.portfolio_cd = A.portfolio_cd
GROUP BY
R.portfolio_cd
),
cteInterceptByPortfolio as
(
SELECT
A.portfolio_cd,
(A.yavg - (S.slope * A.xavg)) as intercept
FROM
cteAvgByPortfolio as A
INNER JOIN
cteSlopeByPortfolio S
ON
A.portfolio_cd = S.portfolio_cd
)
SELECT
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept,
1 - (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) /
(SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) +
SUM(POWER(((I.intercept + S.slope * R.x) - A.yavg), 2)))) as rsquared
FROM
cteRawData as R
INNER JOIN
cteAvgByPortfolio as A ON R.portfolio_cd = A.portfolio_cd
INNER JOIN
cteSlopeByPortfolio S ON A.portfolio_cd = S.portfolio_cd
INNER JOIN
cteInterceptByPortfolio I ON S.portfolio_cd = I.portfolio_cd
GROUP BY
A.portfolio_cd,
A.xavg,
A.yavg,
S.slope,
I.intercept
);
Last but not least, setup a Table Variable and get the answers. Unlike you solution above, it groups by portfolio id.
-- Load data into variable
DECLARE @InTable AS InputToLinearReg;
-- insert data
insert into @InTable
values
('A', 1, 100.00),
('A', 2, 90.00),
('A', 3, 80.00),
('A', 4, 70.00),
('B', 1, 100.00),
('B', 2, 90.00),
('B', 3, 80.00);
-- show data
select * from CalculateLinearReg(@InTable)
go
Here is a picture of the results using your data.