Question

I'm trying to do a running() correlation between my daily climate data, and the problem is that I have many missing values (NA) in my data.frame. I'm using the cor.test() because I need to get the p.values. For example in some days I don't have precipitation or humidity values, and I would like to know how to compute this running correlation with my temperature data, but omitting the NA values.

Here an example with NA values:

library(gtools)
df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300), humi=rnorm(100, 1:100))

df$prec[c(1:10, 25:30, 95:100)] <-NA
df$humi[c(15:19, 20:25, 80:90)] <-NA

corPREC <- t(running(df$temp, df$prec, fun = cor.test, width=10, by=10))
corHUMI <- t(running(df$temp, df$humi, fun = cor.test, width=10, by=10))
Was it helpful?

Solution

You can use complete.cases to get a logical vector of complete rows (TRUE = complete); then subsetting inside ad-hoc function used for testing too

library(gtools)
df <- data.frame(temp=rnorm(100, 10:30), prec=rnorm(100, 1:300),
                 humi=rnorm(100, 1:100))

df$prec[c(1:10, 25:30, 95:100)] <-NA
df$humi[c(15:19, 20:25, 80:90)] <-NA

my.fun <- function(x,y) {
    my.df <- data.frame(x,y)
    my.df.cmpl <- my.df[complete.cases(my.df), ]

    # 3 complete obs is the minimum for cor.test
    if (nrow(my.df.cmpl)<=2) {
        return(rep(NA, 4))
    } else {
        my.test <- cor.test(my.df.cmpl$x,my.df.cmpl$y)
        return(c(my.test$statistic, my.test$p.value,
                 my.test$conf.int))
    }

}

corPREC <- t(running(df$temp, df$prec, fun = my.fun, width=10, by=10))
corHUMI <- t(running(df$temp, df$humi, fun = my.fun, width=10, by=10))

you could also consider

my.test <- cor.test(~ x + y, na.action = "na.exclude", data = my.df)

but you can't handle too-few-rows situations (in a straightforward manner).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top