Question

Possible Duplicate:
Processing the list of data.frames with “apply” family of functions

I have a dataframe with six numeric variables V1, V2, V3 and V1.lag, V2.lag, V3.lag.

NOTE: My real dataset has much more variables but I use 3 for ilustration only!

I would like to be able to automatically (without hardcoding anything) run through all V variables (not lag variables) and create V1.over.V1.lag variables by dividing each V variable with coresponding lag variable.

df<-data.frame(matrix(rnorm(216),72,6));
colnames(df) <- c("v1.raw", "v2.raw", "v3.raw", "v1.lag", "v2.lag", "v3.lag");

Thanks in advance

**EDIT: I figured how to identify "raw" columns and "lag" columns **

raws <- sapply( names(df), function(x){ unlist(strsplit(x, "[.]"))[2] == "raw" } ); ## which are raw factors

lags <- sapply( names(df), function(x){ unlist(strsplit(x, "[.]"))[2] == "lag" } ); ## which are lagged factors

but I still can't figure how to divide all raw factors with their lag counterparts

which(raws);

will give me indices, but how do I combine them with lags into new factor?

df[which(raws)] / df[which(lags)]

doesn't work

Was it helpful?

Solution

Assuming you have only v.raw and v.lag columns in you data.frame, this should work

  mm <- colnames(df) <- c("v1.raw", "v2.raw", "v3.raw", "v1.lag", "v2.lag", "v3.lag")
  df[,gregexpr('.raw',mm) > 0] /df[,gregexpr('.*lag',mm) > 0]

Edit some explanations to the solution :

gregexpr('.raw',mm) > 0
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE  

head(df[,gregexpr('.raw',mm) > 0],1)
     v1.raw     v2.raw    v3.raw
1 0.7719037 -0.2078197 -1.223753

regexpr('.lag',mm) > 0
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

head(df[,gregexpr('.lag',mm) > 0],1)
     v1.lag     v2.lag    v3.lag
1 0.7719037 -0.2078197 -1.223753

Than we use the vectorize / to do division, in one operation.

Here an example :

df <- matrix(rep(c(1,2,3,4,5,6),each = 5),ncol=6)
colnames(df) <- c("v1.raw", "v2.raw", "v3.raw", "v1.lag", "v2.lag", "v3.lag")
    v1.raw v2.raw v3.raw v1.lag v2.lag v3.lag
[1,]      1      2      3      4      5      6
[2,]      1      2      3      4      5      6
[3,]      1      2      3      4      5      6
[4,]      1      2      3      4      5      6
[5,]      1      2      3      4      5      6


mm <- colnames(df)
df[,which(gregexpr('.raw',mm) > 0)] /df[,which(gregexpr('.lag',mm) > 0)]

   v1.raw v2.raw v3.raw      #as expected 1/4 2/5 3/6
[1,]   0.25    0.4    0.5 
[2,]   0.25    0.4    0.5
[3,]   0.25    0.4    0.5
[4,]   0.25    0.4    0.5
[5,]   0.25    0.4    0.5

Edit2 prevent Nan with zero

df <- matrix(rep(c(1,2,3,4,5,6),each = 5),ncol=6)
colnames(df) <- c("v1.raw", "v2.raw", "v3.raw", "v1.lag", "v2.lag", "v3.lag")
df[1,4] <- 0              ## I introduce a 0 here
mm <- colnames(df)
## I use ifelse , because it is vectorize also !
## If you find a 0 , don't compute , and retuen me the original value 
## You can do other things here 
ifelse(df[,which(gregexpr('.lag',mm) > 0)] != 0 ,
       df[,which(gregexpr('.raw',mm) > 0)] /df[,which(gregexpr('.lag',mm) > 0)],
       df[,which(gregexpr('.raw',mm) > 0)])  

    v1.lag v2.lag v3.lag    ## for some reasons ifelse choose other columns names!(lag not raw)
[1,]   1.00    0.4    0.5
[2,]   0.25    0.4    0.5
[3,]   0.25    0.4    0.5
[4,]   0.25    0.4    0.5
[5,]   0.25    0.4    0.5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top