Question

I would like to create a numeric indicator for a matrix such that for each unique element in one variable, it creates a sequence of the length based on the element in another variable. For example:

frame<- data.frame(x = c("a", "a", "a", "b", "b"), y = c(3,3,3,2,2))
frame
  x y
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2

The indicator, z, should look like this:

  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Any and all help greatly appreciated. Thanks.

Was it helpful?

Solution

No ave?

frame$z <- with(frame, ave(y,x,FUN=seq_along) )
frame

#  x y z
#1 a 3 1
#2 a 3 2
#3 a 3 3
#4 b 2 1
#5 b 2 2

A data.table version could be something like below (thanks to @mnel):

#library(data.table)
#frame <- as.data.table(frame)
frame[,z := seq_len(.N), by=x]

My original thought was to use:

frame[,z := .SD[,.I], by=x]

where .SD refers to each subset of the data.table split by x. .I returns the row numbers for an entire data.table. So, .SD[,.I] returns the row numbers within each group. Although, as @mnel points out, this is inefficient compared to the other method as the entire .SD needs to be loaded into memory for each group to run this calculation.

OTHER TIPS

Another approach:

frame$z <- unlist(lapply(rle(as.numeric(frame[, "x"]))$lengths, seq_len))
library(dplyr)
frame %.%
  group_by(x) %.%
  mutate(z = seq_along(y))

You can split the data.frame on x, and generate a new id column based on that:

> frame$z <- unlist(lapply(split(frame, frame$x), function(x) 1:nrow(x)))
> frame
  x y z
1 a 3 1
2 a 3 2
3 a 3 3
4 b 2 1
5 b 2 2

Or even more simply using data.table:

library(data.table)
frame <- data.table(frame)[,z:=1:nrow(.SD),by=x]

Try this where x is the column by which grouping is to be done and y is any numeric column. if there are no numeric columns use seq_along(x), say, in place of y:

transform(frame, z = ave(y, x, FUN = seq_along))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top