Pregunta

I have a data.frame called dt that looks like:

row.names     A     B     C     D
        1   0.1   0.2   0.5   0.3
        2   0.2   0.3   0.4     0
        3    10  -0.1  -0.3   0.3 # remove A cause 10 / 0.2 > 2

And I want to remove the columns such that for a column X, if X[i]/X[i-1]>2,i>=2. i.e. If the current row divided by the previous row is greater than 2 (two-fold increase), remove the column.

I have tried apply like so:

temp<-dt
val<-apply(temp,2,function(y) {
  y<-na.omit(y) # omit na
  ans1 <- y[-1,] / y[-nrow(y),] - 1 # divide previous row
  if (max(ans1,na.rm=TRUE)>2) {
    y<-NULL # remove from temp
  }
})

But it doesn't seem to remove the row from temp. I thought about maybe returning a list of colnames but I can't get them from inside the apply with the way I have done it.

Any ideas?

Thanks.

=== EDIT ===
Figured it out with a modified version of lukeA's answer:

val<-sapply(dt,function(y) {
  y2<-na.omit(y) # omit NA
  ans1 <- y2[-1] / y2[-length(y2)] - 1 # divide previous row
  if (max(ans1,na.rm=TRUE)>1.5|min(ans1,na.rm=TRUE)< -0.5) {
    return(NULL) # return all NULL
  } else {
    return(y) # return original
  }
})
¿Fue útil?

Solución

This will convert your A values to NA (not available):

dt$A[-1] <- ifelse(dt$A[-1] / head(dt$A[-1], -1) > 2, NA, dt$A[-1])

Now you can decide what to do with thoses NAs in your column A, e.g. delete the rows:

dt <- dt[!is.na(dt$A), ]

This will also work for all columns like this:

dt[, -1] <- sapply(dt[, -1], function(x) {
  x[-1] <- ifelse(x[-1] / head(x[-1], -1) > 2, NA, x[-1])
  x
})
dt <- na.omit(dt) # remove NA rows

And if you want to delete the columns with NA, you can do it like this:

dt[, c(1, which(!is.na(colSums(dt[, -1]))))]
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top