I would like column x3
of my dataframe dat
to contain a random sample of column x2
but the random sample should only come from the same factor level given in column x1
. I have researched the functions by()
, ddply()
, and sample()
, but can't seem to make it work. I also checked a similar question but it didn't help me. You can see what I tried in the context of (what I hope is) a reproducible example below.
Here is the example dataframe:
dat <- data.frame(x1=c("a","a","a","b","b","b","c","c","c"),x2=1:9);
dat$x1 <- as.factor(dat$x1);
dat;
x1 x2
1 a 1
2 a 2
3 a 3
4 b 4
5 b 5
6 b 6
7 c 7
8 c 8
9 c 9
Then some of my non-working attempts to generate x3 were the following:
set.seed(99);
by(dat,FUN=dat$x1,dat$x3<-sample(dat$x1,1,replace=FALSE)); #this did not work at all
I also tried this
set.seed(99);
a <- by(dat,dat[,"x1"],function(d){sample(d$x2,3,replace=FALSE)},simplify=TRUE);
dat$x3<-a;
a;
dat[, "x1"]: a
[1] 2 1 3
---------------------------------------------------------------------------------------------------
dat[, "x1"]: b
[1] 6 5 4
---------------------------------------------------------------------------------------------------
dat[, "x1"]: c
[1] 9 7 8
dat;
> dat
x1 x2 x3
1 a 1 2, 1, 3
2 a 2 6, 5, 4
3 a 3 9, 7, 8
4 b 4 2, 1, 3
5 b 5 6, 5, 4
6 b 6 9, 7, 8
7 c 7 2, 1, 3
8 c 8 6, 5, 4
9 c 9 9, 7, 8
I kind of got what I needed into a
in that the random resampling by factor level is there but a
is not a simple vector. I feel that if a
was a vector I would just about have what I need as I could assign it to dat$x3
. To sum up, I would want dat to turn out something like this:
dat
x1 x2 x3
1 a 1 2
2 a 2 1
3 a 3 3
4 b 4 6
5 b 5 5
6 b 6 4
7 c 7 9
8 c 8 7
9 c 9 8
The solution should be efficient for a dataframe with >2 million rows. Thanks anyone for your help. I hope to return the help to others as I get better with r.