Slightly different results between scipy.stats.spearmanr and manual calculation
-
09-12-2020 - |
Question
I have the following dataset.
When I calculate the Spearman correlation coefficient with scipy.stats.spearmanr
, it returns 0.718182.
import pandas as pd
import numpy as np
from scipy.stats import spearmanr
df = pd.DataFrame(
[
[7,3],
[6,5],
[5,4],
[3,2],
[6,4],
[8,9],
[9,7]
],
columns=['Set of A','Set of B'])
correlation, pval = spearmanr(df)
print(f'correlation={correlation:.6f}, p-value={pval:.6f}')
It returns this:
correlation=0.718182, p-value=0.069096
However, when I tried to calculate it manually:
df_rank = pd.DataFrame(
[
[5,2],
[3.5,4],
[2,4],
[1,1],
[3.5,4],
[6,7],
[7,6]
],
columns=['Rank of A','Rank of B'])
cov_rank=np.cov(df_rank.iloc[:,0],df_rank.iloc[:,1])[0][1]
cov_rank/(df_rank.std()[0]*df_rank.std()[1])
It returns a different value.
0.7105597124064275
After the two decimal points are different and I do not know why.
The question is if scipy.stats.spearmanr
expect the data to be ranked or not.
Solution
I think you have a small error in your manual calculation. You assign rank 4 to 4, 4, and 5. The first two should have rank 3.5 and the last should be rank 5. Your calculation then gives the same answer, 0.7181818181818181
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange