Pregunta

Below is an example of my data, table RR_Linest:

Portfolio ---- Month_number ---- Collections
A --- --------- 1 --------------------- $100------------------------------------------------------------------------------------- A-------------- 2 --------------------- $90
A ------------- 3 --------------------- $80-------------------------------------------------------------------------------------- A ------------- 4 --------------------- $70-------------------------------------------------------------------------------------- B ------------- 1 -------------------- $100------------------------------------------------------------------------------------- B ---- -------- 2 ---------------------- $90 ------------------------------------------------------------------------------------- B - ------------ 3 --------------------- $80

I was able to figure out how to how to get the slope,intercept, RSquare for one portfolio by removing the portfolio column and only selecting the month_Number (x) and collections data (y) for only one selected portfolio (I removed data for portfolio B) and running the code below.

I have been trying to change the function so that when I run it; it gives me the slope, intercept, and R-square by portfolio. Does someone know how to do that? I have tried many ways and I just can't figure it out.

First I created the function:

declare @RegressionInput_A [dbo].[RegressionInput_A]

insert into @RegressionInput_A (x,y) select ([model month]),log([collection $]) from [dbo].[RR_Linest]

select * from [dbo].LinearRegression_A

GO

drop function dbo.LinearRegression_A

CREATE FUNCTION dbo.LinearRegression_A
( 
@RegressionInputs_A AS dbo.RegressionInput_A READONLY 
) 
RETURNS @RegressionOutput_A TABLE 
( 
Slope DECIMAL(18, 6), 
Intercept DECIMAL(18, 6), 
RSquare DECIMAL(18, 6) 
) 
AS
BEGIN 

DECLARE @Xaverage AS DECIMAL(18, 6)
DECLARE @Yaverage AS DECIMAL(18, 6)
DECLARE @slope AS DECIMAL(18, 6)
DECLARE @intercept AS DECIMAL(18, 6)
DECLARE @rSquare AS DECIMAL(18, 6)

SELECT
@Xaverage = AVG(x),
@Yaverage = AVG(y)
FROM
@RegressionInputs_A

SELECT
@slope = SUM((x - @Xaverage) * (y - @Yaverage))/SUM(POWER(x - @Xaverage, 2))
FROM
@RegressionInputs_A

SELECT
@intercept = @Yaverage - (@slope * @Xaverage) 

SELECT @rSquare = 1 - (SUM(POWER(y - (@intercept + @slope * x), 2))/(SUM(POWER(y - (@intercept + @slope * x), 2)) + SUM(POWER(((@intercept + @slope * x) - @Yaverage), 2))))
FROM
@RegressionInputs_A

INSERT INTO
@RegressionOutput_A
(
Slope,
Intercept,
RSquare
)
SELECT
@slope,
@intercept,
@rSquare

RETURN

END
GO

Then I run the function

declare @RegressionInput_A  [dbo].[RegressionInput_A]

insert into @RegressionInput_A (x,y)
select
([model month]),log([collection $])
from [dbo].[RR_Linest]

select * from [dbo].[LinearRegression_A](@RegressionInput_A)
¿Fue útil?

Solución

Wow, this is a real cool example of how to use nested CTE's in a In Line Table Value Function. You want to use a ITVF since they are fast. See Wayne Sheffield’s blog article that attests to this fact.

I always start with a sample database/table if it is really complicated to make sure I give the user a correct solution.

Lets create a database named [test] based on model.

--
-- Create a simple db
--

-- use master
use master;
go

-- delete existing databases
IF EXISTS (SELECT name FROM sys.databases WHERE name = N'Test')
DROP DATABASE Test
GO

-- simple db based on model
create database Test;
go

-- switch to new db
use [Test];
go

Lets create a table type named [InputToLinearReg].

--
-- Create table type to pass data
--

-- Delete the existing table type
IF  EXISTS (SELECT * FROM sys.systypes WHERE name = 'InputToLinearReg')
DROP TYPE dbo.InputToLinearReg
GO

--  Create the table type
CREATE TYPE InputToLinearReg AS TABLE
(
portfolio_cd char(1),
month_num int,
collections_amt money
);
go

Okay, here is the multi-layered SELECT statement that uses CTE's. The query analyzer treats this as a SQL statement which can be executed in parallel versus a regular function that can't. See the black box section of Wayne's article.

--
-- Create in line table value function (fast)
--

-- Remove if it exists
IF OBJECT_ID('CalculateLinearReg') > 0
DROP FUNCTION CalculateLinearReg
GO

-- Create the function
CREATE FUNCTION CalculateLinearReg
( 
    @ParmInTable AS dbo.InputToLinearReg READONLY 
) 
RETURNS TABLE 
AS
RETURN
(

  WITH cteRawData as
  (
    SELECT
        T.portfolio_cd,
        CAST(T.month_num as decimal(18, 6)) as x,
        LOG(CAST(T.collections_amt as decimal(18, 6))) as y
    FROM
        @ParmInTable as T
  ),

  cteAvgByPortfolio as
  (
    SELECT
        portfolio_cd,
        AVG(x) as xavg,
        AVG(y) as yavg
    FROM
        cteRawData 
    GROUP BY 
        portfolio_cd
  ),

  cteSlopeByPortfolio as
  (
    SELECT
        R.portfolio_cd,
        SUM((R.x - A.xavg) * (R.y - A.yavg)) / SUM(POWER(R.x - A.xavg, 2)) as slope
    FROM
        cteRawData as R 
    INNER JOIN 
        cteAvgByPortfolio A
    ON 
        R.portfolio_cd = A.portfolio_cd
    GROUP BY 
        R.portfolio_cd
  ),

  cteInterceptByPortfolio as
  (
    SELECT
        A.portfolio_cd,
        (A.yavg - (S.slope * A.xavg)) as intercept
    FROM
        cteAvgByPortfolio as A
    INNER JOIN 
        cteSlopeByPortfolio S
    ON 
        A.portfolio_cd = S.portfolio_cd

  )

  SELECT 
      A.portfolio_cd,
      A.xavg,
      A.yavg,
      S.slope,
      I.intercept,
      1 - (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) /
      (SUM(POWER(R.y - (I.intercept + S.slope * R.x), 2)) + 
      SUM(POWER(((I.intercept + S.slope * R.x) - A.yavg), 2)))) as rsquared
  FROM
      cteRawData as R 
        INNER JOIN 
      cteAvgByPortfolio as A ON R.portfolio_cd = A.portfolio_cd
        INNER JOIN 
      cteSlopeByPortfolio S ON A.portfolio_cd = S.portfolio_cd
        INNER JOIN 
      cteInterceptByPortfolio I ON S.portfolio_cd = I.portfolio_cd
  GROUP BY 
      A.portfolio_cd,
      A.xavg,
      A.yavg,
      S.slope,
      I.intercept
);

Last but not least, setup a Table Variable and get the answers. Unlike you solution above, it groups by portfolio id.

-- Load data into variable
DECLARE @InTable AS InputToLinearReg;

-- insert data
insert into @InTable
values
('A', 1, 100.00),
('A', 2, 90.00),
('A', 3, 80.00),
('A', 4, 70.00),
('B', 1, 100.00),
('B', 2, 90.00),
('B', 3, 80.00);

-- show data
select * from CalculateLinearReg(@InTable)
go

Here is a picture of the results using your data.

enter image description here

Otros consejos

CREATE FUNCTION dbo.LinearRegression 
( 
  @RegressionInputs AS dbo.RegressionInput READONLY 
) 
RETURNS TABLE AS
RETURN
(
  WITH
    t1 AS ( --calculate averages
      SELECT portfolio, x, y,
        AVG(x) OVER(PARTITION BY portfolio) Xaverage,
        AVG(y) OVER(PARTITION BY portfolio) Yaverage
      FROM @RegressionInputs
    ),
    t2 AS ( --calculate slopes
      SELECT portfolio, Xaverage, Yaverage,
        SUM((x - Xaverage) * (y - Yaverage))/SUM(POWER(x - Xaverage, 2)) slope
      FROM t1
      GROUP BY portfolio, Xaverage, Yaverage
    ),
    t3 AS ( --calculate intercepts
      SELECT portfolio, slope,
        (Yaverage - (slope * Xaverage) ) AS intercept
      FROM t2
    ),
    t4 AS ( --calculate rSquare
      SELECT t1.portfolio, slope, intercept,
        1 - (SUM(POWER(y - (intercept + slope * x), 2))/(SUM(POWER(y - (intercept + slope * x), 2)) + SUM(POWER(((intercept + slope * x) - Yaverage), 2)))) AS rSquare
      FROM t1
      INNER JOIN t3 ON (t1.portfolio = t3.portfolio)
      GROUP BY t1.portfolio
    )
  SELECT portfolio, slope, intercept, rSquare FROM t4
)
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top