Question

# dt---------indx_nm1-----indx_val1-------indx_nm2------indx_val2
2009-06-08----ABQI------1001.2------------ACNACTR----------300.05
2009-06-09----ABQI------1002.12 ----------ACNACTR----------341.19
2009-06-10----ABQI------1011.4------------ACNACTR----------382.93
2009-06-11----ABQI------1015.43 ----------ACNACTR----------362.63

I have a table that looks like ^ (but with hundreds of rows that dates from 2009 to 2013). Is there a way that I could calculate the covariance : [(indx_val1 - avg(indx_val1)) * (indx_val2 - avg(indx_val2)] divided by total number of rows for each value of indx_val1 and indx_val2 (loop through the entire table) and return just a simple value for cov(ABQI, ACNACTR)

Was it helpful?

Solution

Since you have aggregates operating over two different groups, you will need two different queries. The main one groups by dt to get your row values per date. The other query has to perform AVG() and COUNT() aggregates across the whole rowset.

To use them both at the same time, you need to JOIN them together. But since there's no actual relation between the two queries, it is a cartesian product and we'll use a CROSS JOIN. Effectively, that joins every row of the main query with the single row retrieved by the aggregate query. You can then perform the arithmetic in the SELECT list, using values from both:

So, building on the query from your earlier question:

SELECT 
 indxs.*,
 ((indx_val2 - indx_val2_avg) * (indx_val1 - indx_val1_avg)) / total_rows AS cv
FROM (
    SELECT 
      dt,
      MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_nm ELSE NULL END) AS indx_nm1,
      MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1,
      MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_nm ELSE NULL END) AS indx_nm2,
      MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2
    FROM table1 a
    GROUP BY dt
  ) indxs 
  CROSS JOIN (
    /* Join against a query returning the AVG() and COUNT() across all rows */
    SELECT
      'ABQI' AS indx_nm1_aname,
      AVG(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1_avg,
      'ACNACTR' AS indx_nm2_aname,
      AVG(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2_avg,
      COUNT(*) AS total_rows
    FROM table1 b
    WHERE indx_nm IN ('ABQI','ACNACTR')
    /* And it is a cartesian product */
  ) aggs
WHERE
  indx_nm1 IS NOT NULL
  AND indx_nm2 IS NOT NULL
ORDER BY dt

Here's a demo, building on your earlier one: http://sqlfiddle.com/#!6/2ec65/14

OTHER TIPS

Here is a Scalar-valued function to perform a covariance calculation on any two column table formatted to XML.

To Test: Compile the function then execute the Alpha Test

    CREATE Function [dbo].[Covariance](@XmlTwoValueSeries xml)
    returns float
    as
    Begin
    /*

    -- -----------
    -- ALPHA TEST
    -- -----------
    IF object_id('tempdb..#_201610101706') is not null DROP TABLE #_201610101706
    select *
    into #_201610101706
    from
    (
        select *
        from
        (
            SELECT '2016-01' Period, 1.24 col0, 2.20 col1
            union
            SELECT '2016-02' Period, 1.6 col0, 3.20 col1
            union
            SELECT '2016-03' Period, 1.0 col0, 2.77 col1
            union
            SELECT '2016-04' Period, 1.9 col0, 2.98 col1
        ) A
    ) A


    DECLARE @XmlTwoValueSeries xml  
    SET @XmlTwoValueSeries = (
    SELECT col0,col1 FROM #_201610101706
    FOR
    XML PATH('Output')
    )

    SELECT dbo.Covariance(@XmlTwoValueSeries) Covariance

    */
    declare @returnvalue numeric(20,10)

    set @returnvalue = 
    (
        SELECT  SUM((x - xAvg) *(y - yAvg)) / MAX(n) AS [COVAR(x,y)]
        from 
        (
            SELECT  1E * x x,
                    AVG(1E * x) OVER (PARTITION BY (SELECT NULL)) xAvg,
                    1E * y y,
                    AVG(1E * y) OVER (PARTITION BY (SELECT NULL)) yAvg,
                    COUNT(*) OVER (PARTITION BY (SELECT NULL)) n
            FROM    
            (
                SELECT 
                    e.c.value('(col0/text())[1]', 'float' ) x,
                    e.c.value('(col1/text())[1]', 'FLOAT' ) y
                FROM @XmlTwoValueSeries.nodes('Output') e(c)            
            ) A
        ) A
    )
    return @returnvalue
    end



    GO
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top