
I would like to create a numeric indicator for a matrix such that for each unique element in one variable, it creates a sequence of the length based on the element in another variable. For example:

frame<- data.frame(x = c("a", "a", "a", "b", "b"), y = c(3,3,3,2,2))
  x y
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2

The indicator, z, should look like this:

  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Any and all help greatly appreciated. Thanks.



No ave?

frame$z <- with(frame, ave(y,x,FUN=seq_along) )

#  x y z
#1 a 3 1
#2 a 3 2
#3 a 3 3
#4 b 2 1
#5 b 2 2

A data.table version could be something like below (thanks to @mnel):

#frame <- as.data.table(frame)
frame[,z := seq_len(.N), by=x]

My original thought was to use:

frame[,z := .SD[,.I], by=x]

where .SD refers to each subset of the data.table split by x. .I returns the row numbers for an entire data.table. So, .SD[,.I] returns the row numbers within each group. Although, as @mnel points out, this is inefficient compared to the other method as the entire .SD needs to be loaded into memory for each group to run this calculation.


Another approach:

frame$z <- unlist(lapply(rle(as.numeric(frame[, "x"]))$lengths, seq_len))
frame %.%
  group_by(x) %.%
  mutate(z = seq_along(y))

You can split the data.frame on x, and generate a new id column based on that:

> frame$z <- unlist(lapply(split(frame, frame$x), function(x) 1:nrow(x)))
> frame
  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Or even more simply using data.table:

frame <- data.table(frame)[,z:=1:nrow(.SD),by=x]

Try this where x is the column by which grouping is to be done and y is any numeric column. if there are no numeric columns use seq_along(x), say, in place of y:

transform(frame, z = ave(y, x, FUN = seq_along))
ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top