How to deal with missing values to calculate correlation matrix in R?
https://www.tutorialspoint.com/how-to-deal-with-missing-values-to-calculate-correlation-matrix-in-r
-
10-09-2020 - |
Question
How to deal with missing values to calculate correlation matrix in R?
Often the data frames and matrices in R, we get have missing values and if we want to find the correlation matrix for those data frames and matrices, we stuck. It happens with almost everyone in Data Analysis but we can solve that problem by using na.omit while using the cor function to calculate the correlation matrix. Check out the examples below for that.
Example
Consider the below data frame −
> x1<-sample(c(1:5,NA),500,replace=TRUE) > x2<-sample(c(rnorm(50,2,5),NA),500,replace=TRUE) > x3<-sample(c(rpois(50,2),NA),500,replace=TRUE) > x4<-sample(c(runif(50,2,10),NA),500,replace=TRUE) > df<-data.frame(x1,x2,x3,x4) > head(df,20)
Output
x1 x2 x3 x4 1 2 2.6347839 4 2.577690 2 3 0.3082031 1 6.250998 3 1 0.3082031 3 7.786711 4 1 2.6347839 0 3.449600 5 NA 2.5107175 1 7.269619 6 4 2.4450443 4 6.250998 7 NA 1.1747742 2 3.053929 8 NA 2.4450443 3 5.860071 9 5 6.6736496 4 7.979433 10 NA 2.4450443 2 6.250998 11 NA 1.1747742 5 NA 12 2 11.1483587 1 9.498951 13 4 2.1400502 NA 9.299100 14 2 -0.8043954 3 2.883222 15 1 1.5054120 0 2.765324 16 1 0.1283554 2 7.918015 17 3 3.0337960 3 5.588130 18 1 4.5603861 2 7.979433 19 3 4.4976830 4 8.434829 20 1 9.4147186 2 3.053929
> tail(df,20)
Output
x1 x2 x3 x4 481 2 -1.9780830 4 9.299100 482 3 2.0495769 1 9.639262 483 3 -4.5421502 2 3.374645 484 NA 2.1400502 3 NA 485 2 -4.0551622 2 5.999863 486 4 5.8547691 2 3.593138 487 NA NA 2 9.549274 488 3 3.9160824 1 3.053929 489 1 11.1483587 5 7.786711 490 3 -2.7581511 2 9.433952 491 NA 4.8002434 1 5.824331 492 2 4.8002434 2 8.434829 493 2 1.9706702 2 3.053929 494 NA 2.5099287 2 7.979433 495 4 1.9706702 1 7.929130 496 2 4.5919890 2 9.973436 497 4 2.5099287 4 7.269619 498 4 0.3082031 3 3.053929 499 1 5.4593713 2 9.973436 500 NA -1.9780830 4 3.219703
> cor(na.omit(df))
Output
x1 x2 x3 x4 x1 1.000000000 0.009571313 -0.06363564 0.03276244 x2 0.009571313 1.000000000 0.08123065 0.03330818 x3 -0.063635640 0.081230649 1.00000000 0.03503841 x4 0.032762439 0.033308181 0.03503841 1.00000000
Let’s have a look at an example with matrix data −
Example
> M<-matrix(sample(c(rpois(10,2),NA),36,replace=TRUE),nrow=6) > M
Output
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 2 2 2 2 NA 3 [2,] 3 2 4 1 4 3 [3,] 3 NA 1 1 1 NA [4,] 3 NA 3 2 2 1 [5,] 1 4 3 2 2 2 [6,] 1 2 1 3 1 1
> cor(na.omit(M))
Output
[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1.0000000 -0.5000000 0.7559289 -0.8660254 0.9449112 0.8660254 [2,] -0.5000000 1.0000000 0.1889822 0.0000000 -0.1889822 0.0000000 [3,] 0.7559289 0.1889822 1.0000000 -0.9819805 0.9285714 0.9819805 [4,] -0.8660254 0.0000000 -0.9819805 1.0000000 -0.9819805 -1.0000000 [5,] 0.9449112 -0.1889822 0.9285714 -0.9819805 1.0000000 0.9819805 [6,] 0.8660254 0.0000000 0.9819805 -1.0000000 0.9819805 1.0000000
Advertisements
Not affiliated with Tutorialspoint