Question

I have a question and hope that some of you can help me. The issue is this: for a given data frame that includes a vector y of length n and a factor f with k different levels, I want to assign a new variable z which has length k to the data frame, based on f.

Example:

 df <- data.frame(y=rnorm(12), f=rep(1:3, length.out=12))
 z  <- c(-1,0,5)

Note that my real z has been constructed to correspond to the unique factor levels, which is why length(z) = length(unique(df$f). I now want to create a vector of length n=12 that contains the value of z that corresponds to the factor level f. (Note: my real factor values are not ordered like in the above example, so just repeating the vector z won't work),

Now, an obvious solution would be to create a vector foutside the data frame, merge it with z and then to use merge. For instance,

 newdf <- data.frame(z=z, f=c(1,2,3))
 df <- merge(df, newdf, by="f")

However, I need to repeat this procedure several thousand times, and this merge-solution seems like shooting with canons on microbes. Hence my question: there almost surely is an easier and more efficient way to do this, but I just don't know how. Could anyone point me in the right direction? I am looking for something like the "inverse" of aggregate or by.

Was it helpful?

Solution

assuming that the values in z correspond to the f levels

df <- data.frame(y=rnorm(12), f= sample(c("a","b","c"),12,replace=T))
z  <- c(-1,0,5)
df$newz<-z[df$f]

In case this is not clear: this works because factors are stored under the covers as integers. When you index z with that vector of factors you are effectively indexing with the underlying integers, which point to the right z value for that factor value.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top