Domanda

I'm trying to figure how to calculate the pearson correlation coefficient using sql. Here is the formula I'm using: enter image description here and here is the table I'm using: enter image description here

This is what I have so far for a query but it's giving me this message: Invalid use of group function

select first_id, second_id, movie_id, first_score, second_score,  count(*) as n, 
sum((first_score-avg(first_score))*(second_score-avg(second_score)))/
(
sqrt(sum(first_score-avg(first_score)))*
sqrt(sum(second_score-avg(second_score))))
as pearson
from connections
group by second_id

Thanks for helping

È stato utile?

Soluzione

Here is a query that does the calculation in the formula:

select sum((first_score - avg_first_score)*(second_score - avg_second_score)) /
       (sqrt(sum(pow((first_score - avg_first_score), 2)))*
        sqrt(sum(pow((second_score - avg_second_score), 2)))
       ) as r      
from connections c cross join
     (select avg(first_score) as avg_first_score, avg(second_score) as avg_second_score
      from connections
     ) const;

There are numerous issues with your attempt. This precalculates the average values for the two scores. It then applies the formula pretty much as written.

Altri suggerimenti

From a purely syntactic perspective, you've got a problem with your group by clause. It should list every non-aggregated column to work properly. It should be:

group by first_id, second_id, movie_id, first_score, second_score
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top